Bioinformatics Systems, Apparatuses, and Methods Executed on a Quantum Processing Platform

ABSTRACT

A system, method and apparatus for executing a bioinformatics analysis on genetic sequence data includes a quantum computing device formed of a set of hardwired quantum logic circuits interconnected by a plurality of superconducting connections to process information represented as a quantum state that is configured as a set of one or more qubits. The hardwired quantum logic circuits may be arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired quantum logic circuits to perform one or more steps in the bioinformatics analysis on the reads of genomic data. Each subset of the hardwired quantum logic circuits may be formed in a wired configuration to perform the one or more steps in the bioinformatics analysis.

CROSS REFERENCES TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.15/497,149, filed Apr. 25, 2017, which claims the benefit of U.S.Provisional Patent Application No. 62/462,869, filed Feb. 23, 2017, thedisclosure of each of which are incorporated herein by reference in itsentirety for all purposes.

FIELD OF THE DISCLOSURE

The subject matter described herein relates to bioinformatics, and moreparticularly to systems, apparatuses, and methods for implementingbioinformatic protocols, such as performing one or more functions foranalyzing genomic data on an integrated circuit, such as on a hardwareprocessing platform.

BACKGROUND TO THE DISCLOSURE

As described in detail herein, some major computational challenges forhigh-throughput DNA sequencing analysis is to address the explosivegrowth in available genomic data, the need for increased accuracy andsensitivity when gathering that data, and the need for fast, efficient,and accurate computational tools when performing analysis on a widerange of sequencing data sets derived from such genomic data.

Keeping pace with such increased sequencing throughput generated by NextGen Sequencers has typically been manifested as multithreaded softwaretools that have been executed on ever greater numbers of fasterprocessors in computer clusters with expensive high availability storagethat requires substantial power and significant IT support costs.Importantly, future increases in sequencing throughput rates willtranslate into accelerating real dollar costs for these secondaryprocessing solutions.

The devices, systems, and methods of their use described herein areprovided, at least in part, so as to address these and other suchchallenges.

SUMMARY OF THE DISCLOSURE

The present disclosure is directed to devices, systems, and methods foremploying the same in the performance of one or more genomics and/orbioinformatics protocols on data generated through a primary processingprocedure, such as on genetic sequence data. For instance, in variousaspects, the devices, systems, and methods herein provided areconfigured for performing secondary and/or tertiary analysis protocolson genetic data, such as data generated by the sequencing of RNA and/orDNA, e.g., by a Next Gen Sequencer (“NGS”). In particular embodiments,one or more secondary processing pipelines for processing geneticsequence data is provided. In other embodiments, one or more tertiaryprocessing pipelines for processing genetic sequence data is provided,such as where the pipelines, and/or individual elements thereof, deliversuperior sensitivity and improved accuracy on a wider range of sequencederived data than is currently available in the art.

For example, provided herein is a system, such as for executing one ormore of a sequence and/or genomic analysis pipeline on genetic sequencedata and/or other data derived therefrom. In various embodiments, thesystem may include one or more of an electronic data source thatprovides digital signals representing a plurality of reads of geneticand/or genomic data, such as where each of the plurality of reads ofgenomic data include a sequence of nucleotides. The system may furtherinclude a memory, e.g., a DRAM, or a cache, such as for storing one ormore of the sequenced reads, one or a plurality of genetic referencesequences, and one or more indices of the one or more genetic referencesequences. The system may additionally include one or more integratedcircuits, such as a FPGA, ASIC, or sASIC, and/or a CPU and/or a GPU,which integrated circuit, e.g., with respect to the FPGA, ASIC, or sASICmay be formed of a set of hardwired digital logic circuits that areinterconnected by a plurality of physical electrical interconnects. Thesystem may additionally include a quantum computing processing unit, foruse in implementing one or more of the methods disclosed herein.

In various embodiments, one or more of the plurality of electricalinterconnects may include an input to the one or more integratedcircuits that may be connected or connectable, e.g., directly, via asuitable wired connection, or indirectly such as via a wireless networkconnection (for instance, a cloud or hybrid cloud), with the electronicdata source. Regardless of a connection with the sequencer, anintegrated circuit of the disclosure may be configured for receiving theplurality of reads of genomic data, e.g., directly from the sequencer orfrom an associated memory. The reads may be digitally encoded in astandard FASTQ or BCL file format. Accordingly, the system may includean integrated circuit having one or more electrical interconnects thatmay be a physical interconnect that includes a memory interface so as toallow the integrated circuit to access the memory.

Particularly, the hardwired digital logic circuit of the integratedcircuit may be arranged as a set of processing engines, such as whereeach processing engine may be formed of a subset of the hardwireddigital logic circuits so as to perform one or more steps in thesequence, genomic, and/or tertiary analysis pipeline, as describedherein below, on the plurality of reads of genetic data as well as onother data derived therefrom. For instance, each subset of the hardwireddigital logic circuits may be in a wired configuration to perform theone or more steps in the analysis pipeline. Additionally, where theintegrated circuit is an FPGA, such steps in the sequence and/or furtheranalysis process may involve the partial reconfiguration of the FPGAduring the analysis process.

Particularly, the set of processing engines may include a mappingmodule, e.g., in a wired configuration, to access, according to at leastsome of the sequence of nucleotides in a read of the plurality of reads,the index of the one or more genetic reference sequences, from thememory via the memory interface, so as to map the read to one or moresegments of the one or more genetic reference sequences based on theindex. Additionally, the set of processing engines may include analignment module in the wired configuration to access the one or moregenetic reference sequences from the memory via the memory interface toalign the read, e.g., the mapped read, to one or more positions in theone or more segments of the one or more genetic reference sequences,e.g., as received from the mapping module and/or stored in the memory.

Further, the set of processing engines may include a sorting module soas to sort each aligned read according to the one or more positions inthe one or more genetic reference sequences. Furthermore, the set ofprocessing engines may include a variant call module, such as forprocessing the mapped, aligned, and/or sorted reads, such as withrespect to a reference genome, to thereby produce an HMM readout and/orvariant call file for use with and/or detailing the variations betweenthe sequenced genetic data and the reference genomic reference data. Invarious instances, one or more of the plurality of physical electricalinterconnects may include an output from the integrated circuit forcommunicating result data from the mapping module and/or the alignmentand/or sorting and/or variant call modules.

Particularly, with respect to the mapping module, in variousembodiments, a system for executing a mapping analysis pipeline on aplurality of reads of genetic data using an index of genetic referencedata is provided. In various instances, the genetic sequence, e.g.,read, and/or the genetic reference data may be represented by a sequenceof nucleotides, which may be stored in a memory of the system. Themapping module may be included within the integrated circuit and may beformed of a set of pre-configured and/or hardwired digital logiccircuits that are interconnected by a plurality of physical electricalinterconnects, which physical electrical interconnects may include amemory interface for allowing the integrated circuit to access thememory. In more particular embodiments, the hardwired digital logiccircuits may be arranged as a set of processing engines, such as whereeach processing engine is formed of a subset of the hardwired digitallogic circuits to perform one or more steps in the sequence analysispipeline on the plurality of reads of genomic data.

For instance, in one embodiment, the set of processing engines mayinclude a mapping module in a hardwired configuration, where the mappingmodule, and/or one or more processing engines thereof is configured forreceiving a read of genomic data, such as via one or more of a pluralityof physical electrical interconnects, and for extracting a portion ofthe read in such a manner as to generate a seed therefrom. In such aninstance, the read may be represented by a sequence of nucleotides, andthe seed may represent a subset of the sequence of nucleotidesrepresented by the read. The mapping module may include or beconnectable to a memory that includes one or more of the reads, one ormore of the seeds of the reads, at least a portion of one or more of thereference genomes, and/or one or more indexes, such an index built fromthe one or more reference genomes. In certain instances, a processingengine of the mapping module employ the seed and the index to calculatean address within the index based on the seed.

Once an address has been calculated or otherwise derived and/or stored,such as in an onboard or offboard memory, the address may be accessed inthe index in the memory so as to receive a record from the address, suchas a record representing position information in the genetic referencesequence. This position information may then be used to determine one ormore matching positions from the read to the genetic reference sequencebased on the record. Then at least one of the matching positions may beoutput to the memory via the memory interface.

In another embodiment, a set of the processing engines may include analignment module, such as in a pre-configured and/or hardwiredconfiguration. In this instance, one or more of the processing enginesmay be configured to receive one or more of the mapped positions for theread data via one or more of the plurality of physical electricalinterconnects. Then the memory (internal or external) may be accessedfor each mapped position to retrieve a segment of the referencesequence/genome corresponding to the mapped position. An alignment ofthe read to each retrieved reference segment may be calculated alongwith a score for the alignment. Once calculated, at least onebest-scoring alignment of the read may be selected and output. Invarious instances, the alignment module may also implement a dynamicprogramming algorithm when calculating the alignment, such as one ormore of a Smith-Waterman algorithm, e.g., with linear or affine gapscoring, a gapped alignment algorithm, and/or a gapless alignmentalgorithm. In particular instances, the calculating of the alignment mayinclude first performing a gapless alignment to each reference segment,and based on the gapless alignment results, selecting reference segmentswith which to further perform gapped alignments.

In various embodiments, a variant call module may be provided forperforming improved variant call functions that when implemented in oneor both of software and/or hardware configurations generate superiorprocessing speed, better processed result accuracy, and enhanced overallefficiency than the methods, devices, and systems currently known in theart. Specifically, in one aspect, improved methods for performingvariant call operations in software and/or in hardware, such as forperforming one or more HMM operations on genetic sequence data, areprovided. In another aspect, novel devices including an integratedcircuit for performing such improved variant call operations, where atleast a portion of the variant call operation is implemented inhardware, are provided.

Accordingly, in various instances, the methods disclosed herein mayinclude mapping, by a first subset of hardwired and/or quantum digitallogic circuits, a plurality of reads to one or more segments of one ormore genetic reference sequences. Additionally, the methods may includeaccessing, by the integrated and/or quantum circuits, e.g., by one ormore of the plurality of physical electrical interconnects, from thememory or a cache associated therewith, one or more of the mapped readsand/or one or more of the genetic reference sequences; and aligning, bya second subset of the hardwired and/or quantum digital logic circuits,the plurality of mapped reads to the one or more segments of the one ormore genetic reference sequences.

In various embodiments, the method may additionally include accessing,by the integrated and/or quantum circuit, e.g., by one or more of theplurality of physical electrical interconnects from a memory or a cacheassociated therewith, the aligned plurality of reads. In such aninstance the method may include sorting, by a third subset of thehardwired and/or quantum digital logic circuits, the aligned pluralityof reads according to their positions in the one or more geneticreference sequences. In certain instances, the method may furtherinclude outputting, such as by one or more of the plurality of physicalelectrical interconnects of the integrated and/or quantum circuit,result data from the mapping and/or the aligning and/or the sorting,such as where the result data includes positions of the mapped and/oraligned and/or sorted plurality of reads.

In some instances, the method may additionally include using theobtained result data, such as by a further subset of the hardwiredand/or quantum digital logic circuits, for the purpose of determininghow the mapped, aligned, and/or sorted data, derived from the subject'ssequenced genetic sample, differs from a reference sequence, so as toproduce a variant call file delineating the genetic differences betweenthe two samples. Accordingly, in various embodiments, the method mayfurther include accessing, by the integrated and/or quantum circuit,e.g., by one or more of the plurality of physical electricalinterconnects from a memory or a cache associated therewith, the mappedand/or aligned and/or sorted plurality of reads. In such an instance themethod may include performing a variant call function, e.g., an HMM orpaired HMM operation, on the accessed reads, by a third or fourth subsetof the hardwired and/or quantum digital logic circuits, so as to producea variant call file detailing how the mapped, aligned, and/or sortedreads vary from that of one or more reference, e.g., haplotype,sequences.

Accordingly, in accordance with particular aspects of the disclosure,presented herein is a compact hardware, e.g., chip based, or quantumaccelerated platform for performing secondary and/or tertiary analyseson genetic and/or genomic sequencing data. Particularly, a platform orpipeline of hardwired and/or quantum digital logic circuits that havespecifically been designed for performing secondary and/or tertiarygenetic analysis, such as on sequenced genetic data, or genomic dataderived therefrom, is provided. Particularly, a set of hardwired digitaland/or quantum logic circuits, which may be arranged as a set ofprocessing engines, may be provided, such as where the processingengines may be present in a preconfigured and/or hardwired and/orquantum configuration on a processing platform of the disclosure, andmay be specifically designed for performing secondary mapping and/oraligning and/or variant call operations related to genetic analysis onDNA and/or RNA data, and/or may be specifically designed for performingother tertiary processing on the results data.

In particular instances, the present devices, systems, and methods ofemploying the same in the performance of one or more genomics and/orbioinformatics secondary and/or tertiary processing protocols, have beenoptimized so as to deliver an improvement in processing speed that isorders of magnitude faster than standard secondary processing pipelinesthat are implemented in software. Additionally, the pipelines and/orcomponents thereof as set forth herein provide better sensitivity andaccuracy on a wide range of sequence derived data sets for the purposesof genomics and bioinformatics processing. In various instances, one ormore of these operations may be performed on by an integrated circuitthat is part of or configured as a general purpose central processingunit and/or a graphics processing unit and/or a quantum processing unit.

For example, genomics and bioinformatics are fields concerned with theapplication of information technology and computer science to the fieldof genetics and/or molecular biology. In particular, bioinformaticstechniques can be applied to process and analyze various genetic and/orgenomic data, such as from an individual, so as to determine qualitativeand quantitative information about that data that can then be used byvarious practitioners in the development of prophylactic, therapeutic,and/or diagnostic methods for preventing, treating, ameliorating, and/orat least identifying diseased states and/or their potential, and thus,improving the safety, quality, and effectiveness of health care on anindividualized level. Hence, because of their focus on advancingpersonalized healthcare, genomics and bioinformatics fields promoteindividualized healthcare that is proactive, instead of reactive, andthis gives the subject in need of treatment the opportunity to becomemore involved in their own wellness. An advantage of employing thegenetics, genomics, and/or bioinformatics technologies disclosed hereinis that the qualitative and/or quantitative analyses of molecularbiological, e.g., genetic, data can be performed on a broader range ofsample sets at a much higher rate of speed and often times moreaccurately, thus expediting the emergence of a personalized healthcaresystem. Particularly, in various embodiments, the genomics and/orbioinformatics related tasks may form a genomics pipeline that includesone or more of a whole genome analysis pipeline, genotyping analysis,micro-array analysis, exome analysis, microbiome analysis, an epigenomeanalysis pipeline, a metagenome analysis pipeline, a joint genotyping,and/or a GATK analysis pipeline.

Accordingly, to make use of these advantages there exists enhanced andmore accurate software implementations for performing one or a series ofsuch bioinformatics based analytical techniques, such as for deploymentby a general purpose CPU and/or GPU and/or may be implemented in one ormore quantum circuits of a quantum processing platform. However, commoncharacteristics of traditionally configured software basedbioinformatics methods and systems is that they are labor intensive,take a long time to execute on such general purpose processors, and areprone to errors. Therefore, bioinformatics systems as implemented hereinthat could perform these algorithms, such as implemented in software bya CPU and/or GPU of quantum processing unit in a less labor and/orprocessing intensive manner with a greater percentage accuracy would beuseful.

Such implementations have been developed and are presented herein, suchas where the genomics and/or bioinformatics analyses are performed byoptimized software run on a CPU and/or GPU and/or quantum computer in asystem that makes use of the genetic sequence data derived by theprocessing units and/or integrated circuits of the disclosure. Further,it is to be noted that the cost of analyzing, storing, and sharing thisraw digital data has far outpaced the cost of producing it. Accordingly,also presented herein are “just in time” storage and/or retrievalmethods that optimize the storage of such data in a manner thatsubstitutes the speed of regenerating the data in exchange for the costof storing such data collectively. Hence, the data generation, analysis,and “just in time” or “JIT” storage methods presented herein solve a keybottleneck that is a long felt but unmet obstacle standing between theever-growing raw data generation and storage and the real medicalinsight being sought from it.

Presented herein, therefore, are systems, apparatuses, and methods forimplementing genomics and/or bioinformatic protocols or portionsthereof, such as for performing one or more functions for analyzinggenomic data, for instance, on one or both of an integrated circuit,such as on a hardware processing platform, and a general purposeprocessor, such as for performing one or more bioanalytic operations insoftware and/or on firmware. For example, as set forth herein below, invarious implementations, an integrated circuit and/or quantum circuit isprovided so as to accelerate one or more processes in a primary,secondary, and/or tertiary processing platform. In various instances,the integrated circuit may be employed in performing genetic analyticrelated tasks, such as mapping, aligning, variant calling, compressing,decompressing, and the like, in an accelerated manner, and as such theintegrated circuit may include a hardware accelerated configuration.Additionally, in various instances, an integrated and/or quantum circuitmay be provided such as where the circuit is part of a processing unitthat is configured for performing one or more genomics and/orbioinformatics protocols on the generated mapped and/or aligned and/orvariant called data.

Particularly, in a first embodiment, a first integrated circuit may beformed of an FPGA, ASIC, and/or sASIC that is coupled to or otherwiseattached to the motherboard and configured, or in the case of an FPGAmay be programmable by firmware to be configured, as a set of hardwireddigital logic circuits that are adapted to perform at least a first setof sequence analysis functions in a genomics analysis pipeline, such aswhere the integrated circuit is configured as described herein above toinclude one or more digital logic circuits that are arranged as a set ofprocessing engines, which are adapted to perform one or more steps in amapping, aligning, and/or variant calling operation on the genetic dataso as to produce sequence analysis results data. The first integratedcircuit may further include an output, e.g., formed of a plurality ofphysical electrical interconnects, such as for communicating the resultdata from the mapping and/or the alignment and/or other procedures tothe memory.

Additionally, a second integrated and/or quantum circuit may beincluded, coupled to or otherwise attached to the motherboard, and incommunication with the memory via a communications interface. The secondintegrated and/or quantum circuit may be formed as a central processingunit (CPU) or graphics processing unit (GPU) or quantum processing unit(QPU) that is configured for receiving the mapped and/or aligned and/orvariant called sequence analysis result data and may be adapted to beresponsive to one or more software algorithms that are configured toinstruct the CPU or GPU to perform one or more genomics and/orbioinformatics functions of the genomic analysis pipeline on the mapped,aligned, and/or variant called sequence analysis result data.Specifically, the genomics and/or bioinformatics related tasks may forma genomics pipeline that includes one or more of a whole genome analysispipeline, genotyping analysis, micro-array analysis, exome analysis,microbiome analysis, an epigenome analysis pipeline, a metagenomeanalysis pipeline, a joint genotyping, and/or a GATK analysis pipeline.

For instance, in one embodiment, the CPU and/or GPU and/or QPU of thesecond integrated circuit may include software that is configured forarranging the genome analysis pipeline for executing a whole genomeanalysis pipeline, such as a whole genome analysis pipeline thatincludes one or more of genome-wide variation analysis, whole-exome DNAanalysis, whole transcriptome RNA analysis, gene function analysis,protein function analysis, protein binding analysis, quantitative geneanalysis, and/or a gene assembly analysis. In certain instances, thewhole genome analysis pipeline may be performed for the purposes of oneor more of ancestry analysis, personal medical history analysis, diseasediagnostics, drug discovery, and/or protein profiling. In a particularinstance, the whole genome analysis pipeline is performed for thepurposes of oncology analysis. In various instances, the results of thisdata may be made available, e.g. globally, throughout the system.

In various instances, the CPU and/or GPU and/or a quantum processingunit (QPU) of the second integrated and/or quantum circuit may includesoftware that is configured for arranging the genome analysis pipelinefor executing a genotyping analysis, such as a genotyping analysisincluding joint genotyping. For instance, the joint genotyping analysismay be performed using a Bayesian probability calculation, such as aBayesian probability calculation that results in an absolute probabilitythat a given determined genotype is a true genotype. In other instances,the software may be configured for performing a metagenome analysis soas to produce metagenome result data that may in turn be employed in theperformance of a microbiome analysis.

In certain instances, the first and/or second integrated circuit and/orthe memory may be housed on an expansion card, such as a peripheralcomponent interconnect (PCI) card. For instance, in various embodiments,one or more of the integrated circuits may be one or more chips coupledto a PCIe card or otherwise associated with the motherboard. In variousinstances, the integrated and/or quantum circuit(s) and/or chip(s) maybe a component within a sequencer or computer, or server, such as partof a server farm. In particular embodiments, the integrated and/orquantum circuit(s) and/or expansion card(s) and/or computer(s) and/orserver(s) may be accessible via the internet, e.g., cloud.

Further, in some instances, the memory may be a volatile random accessmemory (RAM), e.g., a direct access memory (DRAM). Particularly, invarious embodiments, the memory may include at least two memories, suchas a first memory that is an HMEM, e.g., for storing the referencehaplotype sequence data, and a second memory that is an RMEM, e.g., forstoring the read of genomic sequence data. In particular instances, eachof the two memories may include a write port and/or a read port, such aswhere the write port and the read port each accessing a separate clock.Additionally, each of the two memories may include a flip-flopconfiguration for storing a multiplicity of genetic sequence and/orprocessing result data.

Accordingly, in another aspect, the system may be configured for sharingmemory resources amongst its component parts, such as in relation toperforming some computational tasks via software, such as run by the CPUand/or GPU and/or quantum processing platform, and/or performing othercomputational tasks via firmware, such as via the hardware of anassociated integrated circuit, e.g., FPGA, ASIC, and/or sASIC. This maybe achieved in a number of different ways, such as by a direct loose ortight coupling between the CPU/GPU/QPU and the FPGA, e.g., chip or PCIecard. Such configurations may be particularly useful when distributingoperations related to the processing of the large data structuresassociated with genomics and/or bioinformatics analyses to be used andaccessed by both the CPU/GPU/QPU and the associated integrated circuit.Particularly, in various embodiments, when processing data through agenomics pipeline, as herein described, such as to accelerate overallprocessing function, timing, and efficiency, a number of differentoperations may be run on the data, which operations may involve bothsoftware and hardware processing components.

Consequently, data may need to be shared and/or otherwise communicated,between the software component(s) running on the CPU and/or GPU and/orQPU and/or the hardware component embodied in the chip, e.g., an FPGA.Accordingly, one or more of the various steps in the genomics and/orbioinformatics processing pipeline, or a portion thereof, may beperformed by one device, e.g., the CPU/GPU/QPU, and one or more of thevarious steps may be performed by a hardwired device, e.g., the FPGA. Insuch an instance, the CPU/GPU/QPU and/or the FPGA may be communicablycoupled in such a manner to allow the efficient transmission of suchdata, which coupling may involve the shared use of memory resources. Toachieve such distribution of tasks and the sharing of information forthe performance of such tasks, the various CPUs/GPUs/QPUs may be looselyor tightly coupled to one another and/or the hardware devices, e.g.,FPGA, or other chip set, such as by a quick path interconnect.

Particularly, in various embodiments, a genomics analysis platform isprovided. For instance, the platform may include a motherboard, amemory, and plurality of integrated and/or quantum circuits, such asforming one or more of a CPU/GPU/QPU, a mapping module, an alignmentmodule, a sorting module, and/or a variant call module. Specifically, inparticular embodiments, the platform may include a first integratedand/or quantum circuit, such as an integrated circuit forming a centralprocessing unit (CPU) or graphics processing unit (GPU), or a quantumcircuit forming a quantum processor, that is responsive to one or moresoftware or other algorithms that are configured to instruct theCPU/GPU/QPU to perform one or more sets of genomics analysis functions,as described herein, such as where the CPU/GPU/QPU includes a first setof physical electronic interconnects to connect with the motherboard. Invarious instances, the memory may also be attached to the motherboardand may further be electronically connected with the CPU/GPU/QPU, suchas via at least a portion of the first set of physical electronicinterconnects. In such instances, the memory may be configured forstoring a plurality of reads of genomic data, and/or at least one ormore genetic reference sequences, and/or an index of the one or moregenetic reference sequences.

Additionally, the platform may include one or more of another integratedcircuit(s), such as where each of the other integrated circuit forms afield programmable gate array (FPGA) having a second set of physicalelectronic interconnects to connect with the CPU/GPU/QPU and the memory,such as via a point-to-point interconnect protocol. In such an instance,such as where the integrated circuit is an FPGA, the FPGA may beprogrammable by firmware to configure a set of hardwired digital logiccircuits that are interconnected by a plurality of physicalinterconnects to perform a second set of genomics analysis functions,e.g., mapping, aligning, variant calling, etc. Particularly, thehardwired digital logic circuits of the FPGA may be arranged as a set ofprocessing engines to perform one or more pre-configured steps in asequence analysis pipeline of the genomics analysis, such as where theset(s) of processing engines include one or more of a mapping and/oraligning and/or variant call module, which modules may be formed of theseparate or the same subsets of processing engines.

As indicated, the system may be configured to include one or moreprocessing engines, and in various embodiments, an included processingengine may itself be configured for determining one or more transitionprobabilities for the sequence of nucleotides of the read of genomicsequence going from one state to another, such as from a match state toan indel state, or match state to a delete state, and/or back again suchas from an insert or delete state back to a match state. Additionally,in various instances, the integrated circuit may have a pipelinedconfiguration and/or may include a second and/or third and/or fourthsubset of hardwired digital logic circuits, such as including a secondset of processing engines, where the second set of processing enginesincludes a mapping module configured to map the read of genomic sequenceto the reference haplotype sequence to produce a mapped read. A thirdsubset of hardwired digital logic circuits may also be included such aswhere the third set of processing engines includes an aligning moduleconfigured to align the mapped read to one or more positions in thereference haplotype sequence. A fourth subset of hardwired digital logiccircuits may additionally be included such as where the fourth set ofprocessing engines includes a sorting module configured to sort themapped and/or aligned read to its relative positions in the chromosome.Like above, in various of these instances, the mapping module and/or thealigning module and/or the sorting module, e.g., along with the variantcall module, may be physically integrated on the expansion card. And incertain embodiments, the expansion card may be physically integratedwith a genetic sequencer, such as a next gen sequencer and the like.

Accordingly, in one aspect, an apparatus for executing one or more stepsof a sequence analysis pipeline, such as on genetic data, is providedwherein the genetic data includes one or more of a genetic referencesequence(s), such as a haplotype or hypothetical haplotype sequence, anindex of the one or more genetic reference sequence(s), and/or aplurality of reads, such as of genetic and/or genomic data, which datamay be stored in one or more shared memory devices, and/or processed bya distributed processing resource, such as a CPU/GPU/QPU and/or FPGA,which are coupled, e.g., tightly or loosely together. Hence, in variousinstances, the apparatus may include an integrated circuit, whichintegrated circuit may include one or more, e.g., a set, of hardwireddigital logic circuits, wherein the set of hardwired digital logiccircuits may be interconnected, such as by one or a plurality ofphysical electrical interconnects.

Accordingly, the system may be configured to include an integratedcircuit formed of one or more digital logic circuits that areinterconnected by a plurality of physical electrical interconnects, oneor more of the plurality of physical electrical interconnects having oneor more of a memory interface and/or cache, for the integrated circuitto access the memory and/or data stored thereon and to retrieve thesame, such as in a cache coherent manner between the CPU/GPU/QPU andassociated chip, e.g., FPGA. In various instances, the digital logiccircuits may include at least a first subset of digital logic circuits,such as where the first subset of digital logic circuits may be arrangedas a first set of processing engines, which processing engine may beconfigured for accessing the data stored in the cache and/or direct orindirectly coupled memory. For instance, the first set of processingengines may be configured to perform one or more steps in a mappingand/or aligning and/or sorting analysis, as described above, and/or anHMM analysis on the read of genomic sequence data and the haplotypesequence data.

More particularly, a first set of processing engines may include an HMMmodule, such as in a first configuration of the subset of digital logiccircuits, which is adapted to access in the memory, e.g., via the memoryinterface, at least some of the sequence of nucleotides in the read ofgenomic sequence data and the haplotype sequence data, and may also beconfigured to perform the HMM analysis on the at least some of thesequence of nucleotides in the read of genomic sequence data and the atleast some of the sequence of nucleotides in the haplotype sequence dataso as to produce HMM result data. Additionally, the one or more of theplurality of physical electrical interconnects may include an outputfrom the integrated circuit such as for communicating the HMM resultdata from the HMM module, such as to a CPU/GPU/QPU of a server or servercluster.

Accordingly, in one aspect, a method for executing a sequence analysispipeline such as on genetic sequence data is provided. The genetic datamay include one or more genetic reference or haplotype sequences, one ormore indexes of the one or more genetic reference and/or haplotypesequences, and/or a plurality of reads of genomic data. The method mayinclude one or more of receiving, accessing, mapping, aligning, sortingvarious iterations of the genetic sequence data and/or employing theresults thereof in a method for producing one or more variant callfiles. For instance, in certain embodiments, the method may includereceiving, on an input to an integrated circuit from an electronic datasource, one or more of a plurality of reads of genomic data, whereineach read of genomic data may include a sequence of nucleotides.

In various instances, the integrated circuit may be formed of a set ofhardwired digital logic circuits that may be arranged as one or moreprocessing engines. In such an instance, a processing engine may beformed of a subset of the hardwired digital logic circuits that may bein a wired configuration. In such an instance, the processing engine maybe configured to perform one or more pre-configured steps such as forimplementing one or more of receiving, accessing, mapping, aligning,sorting various iterations of the genetic sequence data and/or employingthe results thereof in a method for producing one or more variant callfiles. In some embodiments, the provided digital logic circuits may beinterconnected such as by a plurality of physical electricalinterconnects, which may include an input.

The method may further include accessing, by the integrated circuit onone or more of the plurality of physical electrical interconnects from amemory, data for performing one or more of the operations detailedherein. In various instances, the integrated circuit may be part of achipset such as embedded or otherwise contained as part of an FPGA,ASIC, or structured ASIC, and the memory may be directly or indirectlycoupled to one or both of the chip and/or a CPU/GPU/QPU associatedtherewith. For instance, the memory may be a plurality of memories oneof each coupled to the chip and a CPU/GPU/QPU that is itself coupled tothe chip, e.g., loosely.

In other instances, the memory may be a single memory that may becoupled to a CPU/GPU/QPU that is itself tightly coupled to the FPGA,e.g., via a tight processing interconnect or quick path interconnect,e.g., QPI, and thereby accessible to the FPGA, such as in a cachecoherent manner. Accordingly, the integrated circuit may be directly orindirectly coupled to the memory so as to access data relevant toperforming the functions herein presented, such as for accessing one ormore of a plurality of reads, one or more genetic reference ortheoretical reference sequences, and/or an index of the one or moregenetic reference sequences, e.g., in the performance of a mappingoperation.

Hence, in various instances, implementations of various aspects of thedisclosure may include, but are not limited to: apparatuses, systems,and methods including one or more features as described in detailherein, as well as articles that comprise a tangibly embodiedmachine-readable medium operable to cause one or more machines (e.g.,computers, etc.) to result in operations described herein. Similarly,computer systems are also described that may include one or moreprocessors and/or one or more memories coupled to the one or moreprocessors. Accordingly, computer implemented methods consistent withone or more implementations of the current subject matter can beimplemented by one or more data processors residing in a singlecomputing system or multiple computing systems containing multiplecomputers, such as in a computing or super-computing bank.

Such multiple computing systems can be connected and can exchange dataand/or commands or other instructions or the like via one or moreconnections, including but not limited to a connection over a network(e.g. the Internet, a wireless wide area network, a local area network,a wide area network, a wired network, a physical electricalinterconnect, or the like), via a direct connection between one or moreof the multiple computing systems, etc. A memory, which can include acomputer-readable storage medium, may include, encode, store, or thelike one or more programs that cause one or more processors to performone or more of the operations associated with one or more of thealgorithms described herein.

The details of one or more variations of the subject matter describedherein are set forth in the accompanying drawings and the descriptionbelow. Other features and advantages of the subject matter describedherein will be apparent from the description and drawings, and from theclaims. While certain features of the currently disclosed subject matterare described for illustrative purposes in relation to an enterpriseresource software system or other business software solution orarchitecture, it should be readily understood that such features are notintended to be limiting. The claims that follow this disclosure areintended to define the scope of the protected subject matter.

BRIEF DESCRIPTION OF THE FIGURES

The accompanying drawings, which are incorporated in and constitute apart of this specification, show certain aspects of the subject matterdisclosed herein and, together with the description, help explain someof the principles associated with the disclosed implementations.

FIG. 1 depicts an HMM 3-state based model illustrating the transitionprobabilities of going from one state to another.

FIG. 2 depicts a high-level view of an integrated circuit of thedisclosure including a HMM interface structure.

FIG. 3 depicts the integrated circuit of FIG. 2, showing an HMM clusterfeatures in greater detail.

FIG. 4 depicts an overview of HMM related data flow throughout thesystem including both software and hardware interactions.

FIG. 5 depicts exemplary HMM cluster collar connections.

FIG. 6 depicts a high-level view of the major functional blocks withinan exemplary HMM hardware accelerator.

FIG. 7 depicts an exemplary HMM matrix structure and hardware processingflow.

FIG. 8 depicts an enlarged view of a portion of FIG. 2 showing the dataflow and dependencies between nearby cells in the HMM M, I, and D statecomputations within the matrix.

FIG. 9 depicts exemplary computations useful for M, I, D state updates.

FIG. 10 depicts M, I, and D state update circuits, including the effectsof simplifying assumptions of FIG. 9 related to transition probabilitiesand the effect of sharing some M, I, D adder resources with the finalsum operations.

FIG. 11 depicts Log domain M, I, D state calculation details.

FIG. 12 depicts an HMM state transition diagram showing the relationbetween GOP, GCP and transition probabilities.

FIG. 13 depicts an HMM Transprobs and Priors generation circuit tosupport the general state transition diagram of FIG. 12.

FIG. 14 depicts a simplified HMM state transition diagram showing therelation between GOP, GCP and transition probabilities.

FIG. 15 depicts a HMM Transprobs and Priors generation circuit tosupport the simplified state transition.

FIG. 16 depicts an exemplary theoretical HMM matrix and illustrates howsuch an HMM matrix may be traversed.

FIG. 17 presents a method for performing a multi-region joint detectionpre-processing procedure.

FIG. 18 presents an exemplary method for computing a connection matrixsuch as in the pre-processing procedure of FIG. 17.

FIG. 19 is a graphical representation of the exemplary pileup pursuantto the connection matrix of FIG. 18.

FIG. 20 is a processing matrix for performing the pre-processingprocedure of FIG. 17.

FIG. 21 is an example of a bubble formation in a De Brujin graph inaccordance with the methods of FIG. 20.

FIG. 22 is an example of a variant pathway through an exemplary DeBrujin graph.

FIG. 23 is a graphical representation of an exemplary sorting function.

FIG. 24 is another example of a processing matrix for a prunedmulti-region joint detection procedure.

FIG. 25 illustrates a joint pileup of paired reads for two regions.

FIG. 26 sets forth a probability table in accordance with the disclosedherein.

FIG. 27 is a further example of a processing matrix for a multi-regionjoint detection procedure.

FIG. 28 represents a selection of candidate solutions for the joint pileup of FIG. 25.

FIG. 29 represents a further selection of candidate solutions for thepile up of FIG. 28, after a pruning function has been performed.

FIG. 30 represents the final candidates of FIG. 28, and their associatedprobabilities, after the performance of a MRJD function.

FIG. 31 illustrates the ROC curves for MRJD and a conventional detector.

FIG. 32 illustrates the same results of FIG. 31 displayed as a functionof the sequence similarity of the references.

FIG. 33A depicts an exemplary architecture illustrating a loose couplingbetween a CPU and an FPGA of the disclosure.

FIG. 33B depicts an exemplary architecture illustrating a tight couplingbetween a CPU and an FPGA of the disclosure.

FIG. 34A depicts a direct coupling of a CPU and a FPGA of thedisclosure.

FIG. 34B depicts an alternative embodiment of the direct coupling of aCPU and a FPGA of FIG. 34A.

FIG. 35 depicts an embodiment of a package of a combined CPU and FPGA,where the two devices share a common memory and/or cache.

FIG. 36 illustrates a core of CPUs sharing one or more memories and/orcaches, wherein the CPUs are configured for communicating with one ormore FPGAs that may also include a shared or common memory or caches.

FIG. 37 illustrates an exemplary method of data transfer throughout thesystem.

FIG. 38 depicts the embodiment of FIG. 36 in greater detail.

FIG. 39 depicts an exemplary method for the processing of one or morejobs of a system of the disclosure.

FIG. 40 depicts a block diagram for a genomic infrastructure for onsiteand/or cloud based genomics processing and analysis.

FIG. 41A depicts a block diagram of a local and/or cloud based computingfunction of FIG. 40 for a genomic infrastructure for onsite and/or cloudbased genomics processing and analysis.

FIG. 41B depicts the block diagram of FIG. 41A illustrating greaterdetail regarding the computing function for a genomic infrastructure foronsite and/or cloud based genomics processing and analysis.

FIG. 41C depicts the block diagram of FIG. 40 illustrating greaterdetail regarding the 3rd-Party analytics function for a genomicinfrastructure for onsite and/or cloud based genomics processing andanalysis.

FIG. 42A depicts a block diagram illustrating a hybrid cloudconfiguration.

FIG. 42B depicts the block diagram of FIG. 42A in greater detail,illustrating a hybrid cloud configuration.

FIG. 42C depicts the block diagram of FIG. 42A in greater detail,illustrating a hybrid cloud configuration.

FIG. 43 depicts a block diagram illustrating a primary, secondary,and/or tertiary analysis pipeline as presented herein.

FIG. 44 depicts a flow diagram for an analysis pipeline of thedisclosure.

FIG. 45 is a block diagram of a hardware processor architecture inaccordance with an implementation of the disclosure.

FIG. 46 is a block diagram of a hardware processor architecture inaccordance with another implementation.

FIG. 47 is a block diagram of a hardware processor architecture inaccordance with yet another implementation.

FIG. 48 illustrates a genetic sequence analysis pipeline.

FIG. 49 illustrates processing steps using a genetic sequence analysishardware platform.

FIG. 50A illustrates an apparatus in accordance with an implementationof the disclosure.

FIG. 50B illustrates another apparatus in accordance with an alternativeimplementation of the disclosure.

FIG. 51 illustrates a genomics processing system in accordance with animplementation.

DETAILED DESCRIPTION OF THE DISCLOSURE

As summarized above, the present disclosure is directed to devices,systems, and methods for employing the same in the performance of one ormore genomics and/or bioinformatics protocols, such as a mapping,aligning, sorting, and/or variant call protocol on data generatedthrough a primary processing procedure, such as on genetic sequencedata. For instance, in various aspects, the devices, systems, andmethods herein provided are configured for performing secondary analysisprotocols on genetic data, such as data generated by the sequencing ofRNA and/or DNA, e.g., by a Next Gen Sequencer (“NGS”). In particularembodiments, one or more secondary processing pipelines for processinggenetic sequence data is provided, such as where the pipelines, and/orindividual elements thereof, may be implemented in software, hardware,or a combination thereof in a distributed and/or an optimized fashion soas to deliver superior sensitivity and improved accuracy on a widerrange of sequence derived data than is currently available in the art.Additionally, as summarized above, the present disclosure is directed todevices, systems, and methods for employing the same in the performanceof one or more genomics and/or bioinformatics tertiary protocols, suchas a whole genome analysis protocol, genotyping analysis, micro-arrayanalysis, exome analysis, microbiome analysis, an epigenome analysispipeline, a metagenome analysis pipeline, a joint genotyping, and/or aGATK analysis protocol, such as on mapped, aligned, and/or other geneticsequence data, such as employing one or more variant call files.

Accordingly, provided herein are software and/or hardware e.g., chipbased, accelerated platform analysis technologies for performingsecondary and/or tertiary analysis of DNA/RNA sequencing data. Moreparticularly, a platform, or pipeline, of processing engines, such as ina software implemented and/or hardwired configuration, which havespecifically been designed for performing secondary genetic analysis,e.g., mapping, aligning, sorting, and/or variant calling; and/or may bespecifically designed for performing tertiary genetic analysis, such aswhole genome, genotyping, micro-array, exome, microbiome, epigenome,metagenome, joint genotyping, and/or a GATK analysis, such as withrespect to genetic based sequencing data, which may have been generatedin an optimized format that delivers an improvement in processing speedthat is magnitudes faster than standard pipelines that are implementedin known software alone. Additionally, the pipelines presented hereinprovide better sensitivity and accuracy on a wide range of sequencederived data sets, such as on nucleic acid or protein derived sequences.

As indicated above, in various instances, it is a goal of bioinformaticsprocessing to determine individual genomes and/or protein sequences ofpeople, which determinations may be used in gene discovery protocols aswell as for prophylaxis and/or therapeutic regimes to better enhance thelivelihood of each particular person and human kind as a whole. Further,knowledge of an individual's genome and/or protein compellation may beused such as in drug discovery and/or FDA trials to better predict withparticularity which, if any, drugs will be likely to work on anindividual and/or which would be likely to have deleterious sideeffects, such as by analyzing the individual's genome and/or a proteinprofile derived therefrom and comparing the same with predictedbiological response from such drug administration.

Such bioinformatics processing usually involves three well defined, buttypically separate phases of information processing. The first phase,termed primary processing, involves DNA/RNA sequencing, where asubject's DNA and/or RNA is obtained and subjected to various processeswhereby the subject's genetic code is converted to a machine-readabledigital code, e.g., a FASTQ file. The second phase, termed secondaryprocessing, involves using the subject's generated digital genetic codefor the determination of the individual's genetic makeup, e.g.,determining the individual's genomic nucleotide sequence. And the thirdphase, termed tertiary processing, involves performing one or moreanalyses on the subject's genetic makeup so as to determinetherapeutically useful information therefrom.

Accordingly, once a subject's genetic code is sequenced, such as by aNextGen sequencer, so as to produce a machine readable digitalrepresentation of the subject's genetic code, e.g., in a FASTQ and/orBCL file format, it may be useful to further process the digitallyencoded genetic sequence data obtained from the sequencer and/orsequencing protocol, such as by subjecting digitally represented data tosecondary processing. This secondary processing, for instance, can beused to map and/or align and/or otherwise assemble an entire genomicand/or protein profile of an individual, such as where the individual'sentire genetic makeup is determined, for instance, where each and everynucleotide of each and every chromosome is determined in sequentialorder such that the composition of the individual's entire genome hasbeen identified. In such processing, the genome of the individual may beassembled such as by comparison to a reference genome, such as areference standard, e.g., one or more genomes obtained from the humangenome project or the like, so as to determine how the individual'sgenetic makeup differs from that of the referent(s). This process iscommonly known as variant calling. As the difference between the DNA ofany one person to another is 1 in 1,000 base pairs, such a variantcalling process can be very labor and time intensive, requiring manysteps that may need to be performed one after the other and/orsimultaneously, such as in a pipeline, so to analyze the subject'sgenomic data and determine how that genetic sequence differs from agiven reference.

In performing a secondary analysis pipeline, such as for generating avariant call file for a given query sequence of an individual subject; agenetic sample, e.g., DNA, RNA, protein sample, or the like may beobtained, form the subject. The subject's DNA/RNA may then be sequenced,e.g., by a NextGen Sequencer (NGS) and/or a sequencer-on-a-chiptechnology, e.g., in a primary processing step, so as to produce amultiplicity of read sequence segments (“reads”) covering all or aportion of the individual's genome, such as in an oversampled manner.The end product generated by the sequencing device may be a collectionof short sequences, e.g., reads, that represent small segments of thesubject's genome, e.g., short genetic sequences representing theindividual's entire genome. As indicated, typically, the informationrepresented by these reads may be an image file or in a digital format,such as in FASTQ, BCL, or other similar file format.

Particularly, in a typical secondary processing protocol, a subject'sgenetic makeup is assembled by comparison to a reference genome. Thiscomparison involves the reconstruction of the individual's genome frommillions upon millions of short read sequences and/or the comparison ofthe whole of the individual's DNA to an exemplary DNA sequence model. Ina typical secondary processing protocol an image, FASTQ, and/or BCL fileis received from the sequencer containing the raw sequenced read data.In order to compare the subject's genome to that of the standardreference genome, it needs to be determined where each of these readsmap to the reference genome, such as how each is aligned with respect toone another, and/or how each read can also be sorted by chromosome orderso as to determine at what position and in which chromosome each readbelongs. One or more of these functions may take place prior toperforming a variant call function on the entire full-length sequence,e.g., once assembled. Specifically, once it is determined where in thegenome each read belongs, the full length genetic sequence may bedetermined, and then the differences between the subject's genetic codeand that of the referent can be assessed.

For instance, reference based assembly in a typical secondary processingassembly protocol involves the comparison of sequenced genomic DNA/RNAof a subject to that of one or more standards, e.g., known referencesequences. Various mapping, aligning, sorting, and/or variant callingalgorithms have been developed to help expedite these processes. Thesealgorithms, therefore, may include some variation of one or more of:mapping, aligning, and/or sorting the millions of reads received fromthe image, FASTQ, and/or BCL file communicated by the sequencer, todetermine where on each chromosome each particular read is located. Itis noted that these processes may be implemented in software orhardware, such as by the methods and/or devices described in U.S. Pat.Nos. 9,014,989 and 9,235,680 both assigned to Edico Genome Corporationand incorporated by reference herein in their entireties.

Often a common feature behind the functioning of these variousalgorithms and/or hardware implementations is their use of an indexand/or an array to expedite their processing function.

For example, with respect to mapping, a large quantity, e.g., all, ofthe sequenced reads may be processed to determine the possible locationsin the reference genome to which those reads could possibly align. Onemethodology that can be used for this purpose is to do a directcomparison of the read to the reference genome so as to find all thepositions of matching. Another methodology is to employ a prefix orsuffix array, or to build out a prefix or suffix tree, for the purposeof mapping the reads to various positions in the reference genome. Atypical algorithm useful in performing such a function is aBurrows-Wheeler transform, which is used to map a selection of reads toa reference using a compression formula that compresses repeatingsequences of data.

A further methodology is to employ a hash table, such as where aselected subset of the reads, a k-mer of a selected length “k”, e.g., aseed, are placed in a hash table as keys and the reference sequence isbroken into equivalent k-mer length portions and those portions andtheir location are inserted by an algorithm into the hash table at thoselocations in the table to which they map according to a hashingfunction. A typical algorithm for performing this function is “BLAST”, aBasic Local Alignment Search Tool. Such hash table based programscompare query nucleotide or protein sequences to one or more standardreference sequence databases and calculates the statistical significanceof matches. In such manners as these, it may be determined where anygiven read is possibly located with respect to a reference genome. Thesealgorithms are useful because they require less memory, fewer look ups,LUTs, and therefore require fewer processing resources and time in theperformance of their functions, than would otherwise be the case, suchas if the subject's genome were being assembled by direct comparison,such as without the use of these algorithms.

Additionally, an aligning function may be performed to determine out ofall the possible locations a given read may map to on a genome, such asin those instances where a read may map to multiple positions in thegenome, which is in fact the location from which it actually wasderived, such as by being sequenced therefrom by the original sequencingprotocol. This function may be performed on a number of the reads, e.g.,mapped reads, of the genome and a string of ordered nucleotide basesrepresenting a portion or the entire genetic sequence of the subject'sDNA/RNA may be obtained. Along with the ordered genetic sequence a scoremay be given for each nucleotide in a given position, representing thelikelihood that for any given nucleotide position, the nucleotide, e.g.,“A”, “C”, “G”, “T” (or “U”), predicted to be in that position is in factthe nucleotide that belongs in that assigned position. Typicalalgorithms for performing alignment functions include Needleman-Wunschand Smith-Waterman algorithms. In either case, these algorithms performsequence alignments between a string of the subject's query genomicsequence and a string of the reference genomic sequence whereby insteadof comparing the entire genomic sequences, one with the other, segmentsof a selection of possible lengths are compared.

Once the reads have been assigned a position, such as relative to thereference genome, which may include identifying to which chromosome theread belongs and/or its offset from the beginning of that chromosome,the reads may be sorted by position. This may enable downstream analysesto take advantage of the oversampling procedures described herein. Allof the reads that overlap a given position in the genome will beadjacent to each other after sorting and they can be organized into apileup and readily examined to determine if the majority of them agreewith the reference value or not. If they do not, a variant can beflagged.

For instance, in various embodiments, the methods of the disclosure mayinclude generating a variant call file (VCF) identifying one or more,e.g., all, of the genetic variants in the individual who's DNA/RNA weresequenced, e.g., relevant to one or more reference genomes. Forinstance, once the actual sample genome is known and compared to thereference genome, the variations between the two can be determined, anda list of all the variations/deviations between the reference genome(s)and the sample genome may be called out, e.g., a variant call file maybe produced. Particularly, in one aspect, a variant call file containingall the variations of the subject's genetic sequence to the referencesequence(s) may be generated.

As indicated above, such variations between the two genetic sequencesmay be due to a number of reasons. Hence, in order to generate such afile, the genome of the subject must be sequenced and rebuilt prior todetermining its variants. There are, however, several problems that mayoccur when attempting to generate such an assembly. For example, theremay be problems with the chemistry, the sequencing machine, and/or humanerror that occur in the sequencing process. Furthermore, there may begenetic artifacts that make such reconstructions problematic. Forinstance, a typical problem with performing such assemblies is thatthere are sometimes huge portions of the genome that repeat themselves,such as long sections of the genome that include the same strings ofnucleotides. Hence, because any genetic sequence is not uniqueeverywhere, it may be difficult to determine where in the genome anidentified read actually maps and aligns. Additionally, there may be asingle nucleotide polymorphism (SNP), such as wherein one base in thesubject's genetic sequence has been substituted for another; there maybe more extensive substitutions of a plurality of nucleotides; there maybe an insertion or a deletion, such as where one or a multiplicity ofbases have been added to or deleted from the subject's genetic sequence,and/or there may be a structural variant, e.g., such as caused by thecrossing of legs of two chromosomes, and/or there may simply be anoffset causing a shift in the sequence.

Accordingly, there are two main possibilities for variation. For one,there is an actual variation at the particular location in question, forinstance, where the person's genome is in fact different at a particularlocation than that of the reference, e.g., there is a natural variationdue to an SNP (one base substitution), an Insertion or Deletion (of oneor more nucleotides in length), and/or there is a structural variant,such as where the DNA material from one chromosome gets crossed onto adifferent chromosome or leg, or where a certain region gets copied twicein the DNA. Alternatively, a variation may be caused by there being aproblem in the read data, either through chemistry or the machine,sequencer or aligner, or other human error. The methods disclosed hereinmay be employed in a manner so as to compensate for these types oferrors, and more particularly so as to distinguish errors in variationdue to chemistry, machine or human, and real variations in the sequencedgenome. More specifically, the methods, apparatuses, and systems foremploying the same, as here in described, have been developed so as toclearly distinguish between these two different types of variations andtherefore to better ensure the accuracy of any call files generated soas to correctly identify true variants.

Hence, in particular embodiments, a platform of technologies forperforming genetic analyses are provided where the platform may includethe performance of one or more of: mapping, aligning, sorting, localrealignment, duplicate marking, base quality score recalibration,variant calling, compression, and/or decompression functions. Forinstance, in various aspects a pipeline may be provided wherein thepipeline includes performing one or more analytic functions, asdescribed herein, on a genomic sequence of one or more individuals, suchas data obtained in an image file and/or a digital, e.g., FASTQ or BCL,file format from an automated sequencer. A typical pipeline to beexecuted may include one or more of sequencing genetic material, such asa portion or an entire genome, of one or more individual subjects, whichgenetic material may include DNA, ssDNA, RNA, rRNA, tRNA, and the like,and/or in some instances the genetic material may represent coding ornon-coding regions, such as exomes and/or episomes of the DNA. Thepipeline may include one or more of performing an image processingprocedure, a base calling and/or error correction operation, such as onthe digitized genetic data, and/or may include one or more of performinga mapping, an alignment, and/or a sorting function on the genetic data.In certain instances, the pipeline may include performing one or more ofa realignment, a deduplication, a base quality or score recalibration, areduction and/or compression, and/or a decompression on the digitizedgenetic data. In certain instances the pipeline may include performing avariant calling operation, such as a Hidden Markov Model, on the geneticdata.

Accordingly, in certain instances, the implementation of one or more ofthese platform functions is for the purpose of performing one or more ofdetermining and/or reconstructing a subject's consensus genomicsequence, comparing a subject's genomic sequence to a referent sequence,e.g., a reference or model genetic sequence, determining the manner inwhich the subject's genomic DNA or RNA differs from a referent, e.g.,variant calling, and/or for performing a tertiary analysis on thesubject's genomic sequence, such as for genome-wide variation analysis,gene function analysis, protein function analysis, e.g., protein bindinganalysis, quantitative and/or assembly analysis of genomes and/ortranscriptomes, as well as for various diagnostic, and/or a prophylacticand/or therapeutic evaluation analyses.

As indicated above, in one aspect one or more of these platformfunctions, e.g., mapping, aligning, sorting, realignment, duplicatemarking, base quality score recalibration, variant calling, compression,and/or decompression functions is configured for implementation insoftware. In some aspects, one or more of these platform functions,e.g., mapping, aligning, sorting, local realignment, duplicate marking,base quality score recalibration, decompression, variant calling,compression, and/or decompression functions is configured forimplementation in hardware, e.g., firmware. In certain aspects, thesegenetic analysis technologies may employ improved algorithms that may beimplemented by software that is run in a less processing intensiveand/or less time consuming manner and/or with greater percentageaccuracy, e.g., the hardware implemented functionality is faster, lessprocessing intensive, and more accurate.

For instance, in certain embodiments, improved algorithms for performingsuch secondary and/or tertiary processing, as disclosed herein, areprovided. The improved algorithms are directed to more efficientlyand/or more accurately performing one or more of mapping, aligning,sorting and/or variant calling functions, such as on an image fileand/or a digital representation of DNA/RNA sequence data obtained from asequencing platform, such as in a FASTQ or BCL file format obtained froman automated sequencer such as one of those set forth above. Inparticular embodiments, the improved algorithms may be directed to moreefficiently and/or more accurately performing one or more of localrealignment, duplicate marking, base quality score recalibration,variant calling, compression, and/or decompression functions. Further,as described in greater detail herein below, in certain embodiments,these genetic analysis technologies may employ one or more algorithms,such as improved algorithms, that may be implemented by one or more ofsoftware and/or hardware that is run in a less processing intensiveand/or less time consuming manner and/or with greater percentageaccuracy than various traditional software implementations for doing thesame. In various instances, improved algorithms for implementation on aquantum processing platform are provided.

Hence, in various aspects, presented herein are systems, apparatuses,and methods for implementing bioinformatics protocols, such as forperforming one or more functions for analyzing genetic data, such asgenomic data, for instance, via one or more optimized algorithms and/oron one or more optimized integrated and/or quantum circuits, such as onone or more hardware processing platforms. In one instance, systems andmethods are provided for implementing one or more algorithms, e.g., insoftware and/or in firmware and/or by a quantum processing circuit, forthe performance of one or more steps for analyzing genomic data in abioinformatics protocol, such as where the steps may include theperformance of one or more of: mapping, aligning, sorting, localrealignment, duplicate marking, base quality score recalibration,variant calling, compression, and/or decompression; and may furtherinclude one or more steps in a tertiary processing platform.Accordingly, in certain instances, methods, including software,firmware, hardware, and/or quantum processing algorithms for performingthe methods, are presented herein where the methods involve theperformance of an algorithm, such as an algorithm for implementing oneor more genetic analysis functions such as mapping, aligning, sorting,realignment, duplicate marking, base quality score recalibration,variant calling, compression, decompression, and/or one or more tertiaryprocessing protocols where the algorithm, e.g., including firmware, hasbeen optimized in accordance with the manner in which it is to beimplemented.

In particular, where the algorithm is to be implemented in a softwaresolution, the algorithm and/or its attendant processes, has beenoptimized so as to be performed faster and/or with better accuracy forexecution by that media. Likewise, where the functions of the algorithmare to be implemented in a hardware solution, e.g., as firmware, thehardware has been designed to perform these functions and/or theirattendant processes in an optimized manner so as to be performed fasterand/or with better accuracy for execution by that media. Further, wherethe algorithm is to be implemented in a quantum processing solution, thealgorithm and/or its attendant processes, has been optimized so as to beperformed faster and/or with better accuracy for execution by thatmedia. These methods, for instance, can be employed such as in aniterative mapping, aligning, sorting, variant calling, and/or tertiaryprocessing procedure. In another instance, systems and methods areprovided for implementing the functions of one or more algorithms forthe performance of one or more steps for analyzing genomic data in abioinformatics protocol, as set forth herein, wherein the functions areimplemented on a hardware and/or quantum accelerator, which may or maynot be coupled with one or more general purpose processors and/or supercomputers and/or quantum computers.

More specifically, in some instances, methods and/or machinery forimplementing those methods, for performing secondary analytics on datapertaining to the genetic composition of a subject are provided. In oneinstance, the analytics to be performed may involve reference basedreconstruction of the subject genome. For instance, referenced basedmapping involves the use of a reference genome, which may be generatedfrom sequencing the genome of a single or multiple individuals, or itmay be an amalgamation of various people's DNA/RNA that have beencombined in such a manner so as to produce a prototypical, standardreference genome to which any individual's genetic material, e.g.,DNA/RNA, may be compared, for example, so as to determine andreconstruct the individual's genetic sequence and/or for determining thedifference between their genetic makeup and that of the standardreference, e.g., variant calling.

Particularly, a reason for performing a secondary analysis on asubject's sequenced DNA/RNA is to determine how the subject's DNA/RNAvaries from that of the reference, such as to determine one, amultiplicity, or all, of the differences in the nucleotide sequence ofthe subject from that of the reference. For instance, the differencesbetween the genetic sequences of any two random persons is 1 about in1,000 base pairs, which when taken in view of the entire genome of over3 billion base pairs amounts to a variation of up to 3,000,000 divergentbase pairs per person. Determining these differences may be useful suchas in a tertiary analysis protocol, for instance, so as to predict thepotential for the occurrence of a diseased state, such as because of agenetic abnormality, and/or the likelihood of success of a prophylacticor therapeutic modality, such as based on how a prophylactic ortherapeutic is expected to interact with the subject's DNA or theproteins generated therefrom. In various instances, it may be useful toperform both a de novo and a reference based reconstruction of thesubject's genome so as to confirm the results of one against the other,and to, where desirable, enhance the accuracy of a variant callingprotocol.

Accordingly, in one aspect, in various embodiments, once the subject'sgenome has been reconstructed and/or a VCF has been generated, such datamay then be subjected to tertiary processing so as to interpret it, suchas for determining what the data means with respect to identifying whatdiseases this person may or may have the potential for suffer fromand/or for determining what treatments or lifestyle changes this subjectmay want to employ so as to ameliorate and/or prevent a diseased state.For example, the subject's genetic sequence and/or their variant callfile may be analyzed to determine clinically relevant genetic markersthat indicate the existence or potential for a diseased state and/or theefficacy of a proposed therapeutic or prophylactic regimen may have onthe subject. This data may then be used to provide the subject with oneor more therapeutic or prophylactic regimens so as to better thesubject's quality of life, such as treating and/or preventing a diseasedstate.

Particularly, once one or more of an individual's genetic variations aredetermined, such variant call file information can be used to developmedically useful information, which in turn can be used to determine,e.g., using various known statistical analysis models, health relateddata and/or medical useful information, e.g., for diagnostic purposes,e.g., diagnosing a disease or potential therefore, clinicalinterpretation (e.g., looking for markers that represent a diseasevariant), whether the subject should be included or excluded in variousclinical trials, and other such purposes. More particularly, in variousinstances, the generated genomics and/or bioinformatics processedresults data may be employed in the performance of one or more genomicsand/or bioinformatics tertiary protocols, such as a whole genomeanalysis protocol, genotyping analysis, micro-array analysis, exomeanalysis, microbiome analysis, an epigenome analysis pipeline, ametagenome analysis pipeline, a joint genotyping, and/or a GATK analysisprotocol.

As there are a finite number of diseased states that are caused bygenetic malformations, in tertiary processing variants of a certaintype, e.g., those known to be related to the onset of diseased states,can be queried for, such as by determining if one or more genetic baseddiseased markers are included in the variant call file of the subject.Consequently, in various instances, the methods herein disclosed mayinvolve analyzing, e.g., scanning, the VCF and/or the generatedsequence, against a known disease sequence variant, such as in a database of genomic markers therefore, so as to identify the presence of thegenetic marker in the VCF and/or the generated sequence, and if presentto make a call as to the presence or potential for a genetically induceddiseased state. Since there are a large number of known geneticvariations and a large number of individual's suffering from diseasescaused by such variations, in some embodiments, the methods disclosedherein may entail the generation of one or more databases linkingsequenced data for an entire genome and/or a variant call filepertaining thereto, e.g., such as from an individual or a plurality ofindividuals, and a diseased state and/or searching the generateddatabases to determine if a particular subject has a genetic compositionthat would predispose them to having such diseased state. Such searchingmay involve a comparison of one entire genome with one or more others,or a fragment of a genome, such as a fragment containing only thevariations, to one or more fragments of one or more other genomes suchas in a database of reference genomes or fragments thereof.

Therefore, in various instances, a pipeline of the disclosure mayinclude one or more modules, wherein the modules are configured forperforming one or more functions, such as an image processing or a basecalling and/or error correction operation and/or a mapping and/or analignment, e.g., a gapped or gapless alignment, and/or a sortingfunction on genetic data, e.g., sequenced genetic data. And in variousinstances, the pipeline may include one or more modules, wherein themodules are configured for performing one more of a local realignment, adeduplication, a base quality score recalibration, a variant calling,e.g., HMM, a reduction and/or compression, and/or a decompression on thegenetic data. Additionally, the pipeline may include one or moremodules, wherein the modules are configured for performing a tertiaryanalysis protocol, such as a whole genome analysis protocol, genotypinganalysis, micro-array analysis, exome analysis, microbiome analysis, anepigenome analysis pipeline, a metagenome analysis pipeline, a jointgenotyping, and/or a GATK analysis protocol.

Many of these modules may either be performed by software or on hardwareor remotely, e.g., via software or hardware, such as on the cloud or aremote server and/or server bank, such as a quantum computing cluster.Additionally, many of these steps and/or modules of the pipeline areoptional and/or can be arranged in any logical order and/or omittedentirely. For instance, the software and/or hardware disclosed hereinmay or may not include an image processing and/or a base calling orsequence correction algorithm, such as where there may be concern thatsuch functions may result in a statistical bias. Consequently the systemmay include or may not include the base calling and/or sequencecorrection function, respectively, dependent on the level of accuracyand/or efficiency desired. And as indicated above, one or more of thepipeline functions may be employed in the generation of a genomicsequence of a subject such as through a reference based genomicreconstruction. Also as indicated above, in certain instances, theoutput from the pipeline is a variant call file indicating a portion orall the variants in a genome or a portion thereof.

Particularly, once the reads are assigned a position relative to thereference genome, which may include identifying to which chromosome theread belongs and its offset from the beginning of that chromosome, theymay be sorted, such as by position. This enables downstream analyses totake advantage of the various oversampling protocols described herein.All of the reads that overlap a given position in the genome may bepositioned adjacent to each other after sorting and they can be piled upand readily examined to determine if the majority of them agree with thereference value or not. If they do not, as indicated above, a variantcan be flagged.

Accordingly, as indicated above with respect to mapping, the image file,BCL file, and/or FASTQ file obtained from the sequencer is comprised ofa plurality, e.g., millions to a billion or more, of reads consisting ofshort strings of nucleotide sequence data representing a portion or theentire genome of an individual. Mapping, in general, involves plottingthe reads to all the locations in the reference genome to where there isa match. For example, dependent on the size of the read there may be oneor a plurality of locations where the read substantially matches acorresponding sequence in the reference genome. Hence, the mappingand/or other functions disclosed herein may be configured fordetermining where out of all the possible locations one or more readsmay match to in the reference genome is actually the true location towhere they map.

For instance, in various instances, an index of a reference genome maybe generated or otherwise provided, so that the reads or portions of thereads may be looked up, e.g., within a Look-Up Table (LUT), in referenceto the index, thereby retrieving indications of locations in thereference, so as to map the reads to the reference. Such an index of thereference can be constructed in various forms and queried in variousmanners. In some methods, the index may include a prefix and/or a suffixtree. In particular methods, the index may be derived from aBurrows/Wheeler transform of the reference. Hence, alternatively, or inaddition to employing a prefix or a suffix tree, a Burrows/Wheelertransform can be performed on the data. For instance, a Burrows/Wheelertransform may be used to store a tree-like data structure abstractlyequivalent to a prefix and/or suffix tree, in a compact format, such asin the space allocated for storing the reference genome. In variousinstances, the data stored is not in a tree-like structure, but ratherthe reference sequence data is in a linear list that may have beenscrambled into a different order so as to transform it in a veryparticular way such that the accompanying algorithm allows the referenceto be searched with reference to the sample reads so as to effectivelywalk the “tree”.

Additionally, in various instances, the index may include one or morehash tables, and the methods disclosed herein may include a hashfunction that may be performed on one or more portions of the reads inan effort to map the reads to the reference, e.g., to the index of thereference. For instance, alternatively, or in addition to utilizing oneor both a prefix/suffix tree and/or a Burrows/Wheeler transform on thereference genome and subject sequence data, so as to find where the onemaps against the other, another such method involves the production of ahash table index and/or the performance of a hash function. The hashtable index may be a large reference structure that is built up fromsequences of the reference genome that may then be compared to one ormore portions of the read to determine where the one may match to theother. Likewise, the hash table index may be built up from portions ofthe read that may then be compared to one or more sequences of thereference genome and thereby used to determine where the one may matchto the other.

Implementation of a hash table is a fast method for performing a patternmatch. Each lookup takes a nearly constant amount of time to perform.Such method may be contrasted with the Burrows-Wheeler method which mayrequire many probes (the number may vary depending on how many bits arerequired to find a unique pattern) per query to find a match, or abinary search method that takes log 2(N) probes where N is the number ofseed patterns in the table. Further, even though the hash function canbreak the reference genome down into segments of seeds of any givenlength, e.g., 28 base pairs, and can then convert the seeds into adigital, e.g., binary, representation of 56 bits, not all 56 bits needbe accessed entirely at the same time or in the same way. For instance,the hash function can be implemented in such a manner that the addressfor each seed is designated by a number less than 56 bits, such as about18 to about 44 or 46 bits, such as about 20 to about 40 bits, such asabout 24 to about 36 bits, including about 28 to about 32 or about 30bits may be used as an initial key or address so as to access the hashtable. For example, in certain instances, about 26 to about 29 bits maybe used as a primary access key for the hash table, leaving about 27 toabout 30 bits left over, which may be employed as a means for doublechecking the first key, e.g., if both the first and second keys arriveat the same cell in the hash table, then it is relatively clear thatsaid location is where they belong.

For instance, a first portion of the digitally represented seed, e.g.,about 26 to about 32, such as about 29 bits, can form a primary accesskey and be hashed and may be looked up in a first step. And, in a secondstep, the remaining about 27 to about 30 bits, e.g., a secondary accesskey, can be inserted into the hash table, such as in a hash chain, as ameans for confirming the first pass. Accordingly, for any seed, itsoriginal address bits may be hashed in a first step, and the secondaryaddress bits may be used in a second, confirmation step. In such aninstance, the first portion of the seeds can be inserted into a primaryrecord location, and the second portion may be fit into the table in asecondary record chain location. And, as indicated above, in variousinstances, these two different record locations may be positionallyseparated, such as by a chain format record.

In particular instances, a brute force linear scan can be employed tocompare the reference to the read, or portions thereof. However, using abrute force linear search to scan the reference genome for locationswhere a seed matches, over 3 billion locations may have to be checked.Which searching can be performed, in accordance with the methodsdisclosed herein, in software or hardware. Nevertheless, by using ahashing approach, as set forth herein, each seed lookup can occur inapproximately a constant amount of time. Often, the location can beascertained in a few, e.g., a single access. However, in cases wheremultiple seeds map to the same location in the table, e.g., they are notunique enough, a few additional accesses may be made to find the seedbeing currently looked up. Hence, even though there can be 30M or morepossible locations for a given 100 nucleotide length read to match upto, with respect to a reference genome, the hash table and hash functioncan quickly determine where that read is going to show up in thereference genome. By using a hash table index, therefore, it is notnecessary to search the whole reference genome, e.g., by brute force, todetermine where the read maps and aligns.

In view of the above, any suitable hash function may be employed forthese purposes, however, in various instances, the hash function used todetermine the table address for each seed may be a cyclic redundancycheck (CRC) that may be based on a 2 k-bit primitive polynomial, asindicated above. Alternatively, a trivial hash function mapper may beemployed such as by simply dropping some of the 2 k bits. However, invarious instances, the CRC may be a stronger hash function that maybetter separate similar seeds while at the same time avoiding tablecongestion. This may especially be beneficial where there is no speedpenalty when calculating CRCs such as with the dedicated hardwaredescribed herein. In such instances, the hash record populated for eachseed may include the reference position where the seed occurred, and theflag indicating whether it was reverse complemented before hashing.

The output returned from the performance of a mapping function may be alist of possibilities as to where one or more, e.g., each, read maps toone or more reference genomes. For instance, the output for each mappedread may be a list of possible locations the read may be mapped to amatching sequence in the reference genome. In various embodiments, anexact match to the reference for at least a piece, e.g., a seed of theread, if not all of the read may be sought. Accordingly, in variousinstances, it is not necessary for all portions of all the reads tomatch exactly to all the portions of the reference genome.

As described herein, all of these operations may be performed viasoftware or may be hardwired, such as into an integrated circuit, suchas on a chip, for instance as part of a circuit board. For instance, thefunctioning of one or more of these algorithms may be embedded onto achip, such as into a FPGA (field programmable gate array) or ASIC(application specific integrated circuit) chip, and may be optimized soas to perform more efficiently because of their implementation in suchhardware. Additionally, one or more, e.g., two or all three, of thesemapping functions may form a module, such as a mapping module, that mayform part of a system, e.g., a pipeline, that is used in a process fordetermining an actual entire genomic sequence, or a portion thereof, ofan individual.

An advantage of implementing the hash module in hardware is that theprocesses may be accelerated and therefore performed in a much fastermanner. For instance, where software may include various instructionsfor performing one or more of these various functions, theimplementation of such instructions often requires data and instructionsto be stored and/or fetched and/or read and/or interpreted, such asprior to execution. As indicated above, however, and described ingreater detail herein, a chip can be hardwired to perform thesefunctions without having to fetch, interpret, and/or perform one or moreof a sequence of instructions. Rather, the chip may be wired to performsuch functions directly. Accordingly, in various aspects, the disclosureis directed to a custom hardwired machine that may be configured suchthat portions or all of the above described mapping, e.g., hashing,module may be implemented by one or more network circuits, such asintegrated circuits hardwired on a chip, such as an FPGA or ASIC.

For example, in various instances, the hash table index may beconstructed and the hash function may be performed on a chip, and inother instances, the hash table index may be generated off of the chip,such as via software run by a host CPU, but once generated it is loadedonto or otherwise made accessible to the hardware and employed by thechip, such as in running the hash module. Particularly, in variousinstances, the chip, such as an FPGA, may be configured so as to betightly coupled to the host CPU, such as by a low latency interconnect,such as a QPI interconnect. More particularly, the chip and CPU may beconfigured so as to be tightly coupled together in such a manner so asto share one or more memory resources, e.g., a DRAM, in a cache coherentconfiguration, as described in more detail below. In such an instance,the host memory may build and/or include the reference index, e.g., thehash table, which may be stored in the host memory but be made readilyaccessible to the FPGA such as for its use in the performance of a hashor other mapping function. In particular embodiments, one or both of theCPU and the FPGA may include one or more caches or registers that may becoupled together so as to be in a coherent configuration such thatstored data in one cache may be substantially mirrored by the other.

Accordingly, in view of the above, at run-time, one or more previouslyconstructed hash tables, e.g., containing an index of a referencegenome, or a constructed or to be constructed hash table, may be loadedinto onboard memory or may at least be made accessible by its hostapplication, as described in greater detail herein below. In such aninstance, reads, e.g., stored in FASTQ file format, may be sent by thehost application to the onboard processing engines, e.g., a memory orcache or other register associated therewith, such as for use by amapping and/or alignment and/or sorting engine, such as where theresults thereof may be sent to and used for performing a variant callfunction. With respect thereto, as indicated above, in variousinstances, a pile up of overlapping seeds may be generated, e.g., via aseed generation function, and extracted from the sequenced reads, orread-pairs, and once generated the seeds may be hashed, such as againstan index, and looked up in the hash table so as to determine candidateread mapping positions in the reference.

More particularly, in various instances, a mapping module may beprovided, such as where the mapping module is configured to perform oneor more mapping functions, such as in a hardwired configuration.Specifically, the hardwired mapping module may be configured to performone or more functions typically performed by one or more algorithms runon a CPU, such as the functions that would typically be implemented in asoftware based algorithm that produces a prefix and/or suffix tree, aBurrows-Wheeler Transform, and/or runs a hash function, for instance, ahash function that makes use of, or otherwise relies on, a hash-tableindexing, such as of a reference, e.g., a reference genome sequence. Insuch instances, the hash function may be structured so as to implement astrategy, such as an optimized mapping strategy that may be configuredto minimize the number of memory accesses, e.g., large-memory randomaccesses, being performed so as to thereby maximize the utility of theon-board or otherwise associated memory bandwidth, which mayfundamentally be constrained such as by space within the chiparchitecture.

Further, in certain instances, in order to make the system moreefficient, the host CPU/GPU/QPU may be tightly coupled to the associatedhardware, e.g., FPGA, such as by a low latency interface, e.g., QuickPath Interconnect (“QPI”), so as to allow the processing engines of theintegrated circuit to have ready access to host memory. In particularinstances, the interaction between the host CPU and the coupled chip andtheir respective associated memories, e.g., one or more DRAMs, may beconfigured so as to be cache coherent. Hence, in various embodiments, anintegrated circuit may be provided wherein the integrated circuit hasbeen pre-configured, e.g., prewired, in such a manner as to include oneor more digital logic circuits that may be in a wired configuration,which may be interconnected, e.g., by one or a plurality of physicalelectrical interconnects, and in various embodiments, the hardwireddigital logic circuits may be arranged into one or more processingengines so as to form one or more modules, such as a mapping module.

Accordingly, in various instances, a mapping module may be provided,such as in a first pre-configured wired, e.g., hardwired, configuration,where the mapping module is configured to perform various mappingfunctions. For instance, the mapping module may be configured so as toaccess, at least some of a sequence of nucleotides in a read of aplurality of reads, derived from a subject's sequenced genetic sample,and/or a genetic reference sequence, and/or an index of one or moregenetic reference sequences, from a memory or a cache associatedtherewith, e.g., via a memory interface, such as a process interconnect,for instance, a Quick Path Interconnect, and the like. The mappingmodule may further be configured for mapping the read to one or moresegments of the one or more genetic reference sequences, such as basedon the index. For example, in various particular embodiments, themapping algorithm and/or module presented herein, may be employed tobuild, or otherwise construct a hash table whereby the read, or aportion thereof, of the sequenced genetic material from the subject maybe compared with one or more segments of a reference genome, so as toproduce mapped reads. In such an instance, once mapping has beenperformed, an alignment may be performed.

For example, after it has been determined where all the possible matchesare for the seeds against the reference genome, it must be determinedwhich out of all the possible locations a given read may match to is infact the correct position to which it aligns. Hence, after mapping theremay be a multiplicity of positions that one or more reads appear tomatch in the reference genome. Consequently, there may be a plurality ofseeds that appear to be indicating the exact same thing, e.g., they maymatch to the exact same position on the reference, if you take intoaccount the position of the seed in the read. The actual alignment,therefore, must be determined for each given read. This determinationmay be made in several different ways.

In one instance, all the reads may be evaluated so as to determine theircorrect alignment with respect to the reference genome based on thepositions indicated by every seed from the read that returned positioninformation during the mapping, e.g., hash lookup, process. However, invarious instances, prior to performing an alignment, a seed chainfiltering function may be performed on one or more of the seeds. Forinstance, in certain instances, the seeds associated with a given readthat appear to map to the same general place as against the referencegenome may be aggregated into a single chain that references the samegeneral region. All of the seeds associated with one read may be groupedinto one or more seed chains such that each seed is a member of only onechain. It is such chain(s) that then cause the read to be aligned toeach indicated position in the reference genome.

Specifically, in various instances, all the seeds that have the samesupporting evidence indicating that they all belong to the same generallocation(s) in the reference may be gathered together to form one ormore chains. The seeds that group together, therefore, or at leastappear as they are going to be near one another in the reference genome,e.g., within a certain band, will be grouped into a chain of seeds, andthose that are outside of this band will be made into a different chainof seeds. Once these various seeds have been aggregated into one or morevarious seed chains, it may be determined which of the chains actuallyrepresents the correct chain to be aligned. This may be done, at leastin part, by use of a filtering algorithm that is a heuristic designed toeliminate weak seed chains which are highly unlikely to be the correctone.

The outcome from performing one or more of these mapping, filtering,and/or editing functions is a list of reads which includes for each reada list of all the possible locations to where the read may matchup withthe reference genome. Hence, a mapping function may be performed so asto quickly determine where the reads of the image file, BCL file, and/orFASTQ file obtained from the sequencer map to the reference genome,e.g., to where in the whole genome the various reads map. However, ifthere is an error in any of the reads or a genetic variation, you maynot get an exact match to the reference and/or there may be severalplaces one or more reads appear to match. It, therefore, must bedetermined where the various reads actually align with respect to thegenome as a whole.

Accordingly, after mapping and/or filtering and/or editing, the locationpositions for a large number of reads have been determined, where forsome of the individual reads a multiplicity of location positions havebeen determined, and it now needs to be determined which out of all thepossible locations is in fact the true or most likely location to whichthe various reads align. Such aligning may be performed by one or morealgorithms, such as a dynamic programming algorithm that matches themapped reads to the reference genome and runs an alignment functionthereon. An exemplary aligning function compares one or more, e.g., allof the reads, to the reference, such as by placing them in a graphicalrelation to one another, e.g., such as in a table, e.g., a virtual arrayor matrix, where the sequence of one of the reference genome or themapped reads is placed on one dimension or axis, e.g., the horizontalaxis, and the other is placed on the opposed dimensions or axis, such asthe vertical axis. A conceptual scoring wave front is then passed overthe array so as to determine the alignment of the reads with thereference genome, such as by computing alignment scores for each cell inthe matrix.

The scoring wave front represents one or more, e.g., all, the cells of amatrix, or a portion of those cells, which may be scored independentlyand/or simultaneously according to the rules of dynamic programmingapplicable in the alignment algorithm, such as Smith-Waterman, and/orNeedleman-Wunsch, and/or related algorithms. Alignment scores may becomputed sequentially or in other orders, such as by computing all thescores in the top row from left to right, followed by all the scores inthe next row from left to right, etc. In this manner the diagonallysweeping diagonal wave front represents an optimal sequence of batchesof scores computed simultaneously or in parallel in a series of wavefront steps.

For instance, in one embodiment, a window of the reference genomecontaining the segment to which a read was mapped may be placed on thehorizontal axis, and the read may be positioned on the vertical axis. Ina manner such as this an array or matrix is generated, e.g., a virtualmatrix, whereby the nucleotide at each position in the read may becompared with the nucleotide at each position in the reference window.As the wave front passes over the array, all potential ways of aligningthe read to the reference window are considered, including if changes toone sequence would be required to make the read match the referencesequence, such as by changing one or more nucleotides of the read toother nucleotides, or inserting one or more new nucleotides into onesequence, or deleting one or more nucleotides from one sequence.

An alignment score, representing the extent of the changes that would berequired to be made to achieve an exact alignment, is generated, whereinthis score and/or other associated data may be stored in the given cellsof the array. Each cell of the array corresponds to the possibility thatthe nucleotide at its position on the read axis aligns to the nucleotideat its position on the reference axis, and the score generated for eachcell represents the partial alignment terminating with the cell'spositions in the read and the reference window. The highest scoregenerated in any cell represents the best overall alignment of the readto the reference window. In various instances, the alignment may beglobal, where the entire read must be aligned to some portion of thereference window, such as using a Needleman-Wunsch or similar algorithm;or in other instances, the alignment may be local, where only a portionof the read may be aligned to a portion of the reference window, such asby using a Smith-Waterman or similar algorithm.

Accordingly, in various instances, an alignment function may beperformed, such as on the data obtained from the mapping module. Hence,in various instances, an alignment function may form a module, such asan alignment module, that may form part of a system, e.g., a pipeline,that is used, such as in addition with a mapping module, in a processfor determining the actual entire genomic sequence, or a portionthereof, of an individual. For instance, the output returned from theperformance of the mapping function, such as from a mapping module,e.g., the list of possibilities as to where one or more or all of thereads maps to one or more positions in one or more reference genomes,may be employed by the alignment function so as to determine the actualsequence alignment of the subject's sequenced DNA.

Such an alignment function may at times be useful because, as describedabove, often times, for a variety of different reasons, the sequencedreads do not always match exactly to the reference genome. For instance,there may be an SNP (single nucleotide polymorphism) in one or more ofthe reads, e.g., a substitution of one nucleotide for another at asingle position; there may be an “indel,” insertion or deletion of oneor more bases along one or more of the read sequences, which insertionor deletion is not present in the reference genome; and/or there may bea sequencing error (e.g., errors in sample prep and/or sequencer readand/or sequencer output, etc.) causing one or more of these apparentvariations. Accordingly, when a read varies from the reference, such asby an SNP or indel, this may be because the reference differs from thetrue DNA sequence sampled, or because the read differs from the true DNAsequence sampled. The problem is to figure out how to correctly alignthe reads to the reference genome given the fact that in all likelihoodthe two sequences are going to vary from one another in a multiplicityof different ways.

In various instances, the input into an alignment function, such as froma mapping function, such as a prefix/suffix tree, or a Burrows/Wheelertransform, or a hash table and/or hash function, may be a list ofpossibilities as to where one or more reads may match to one or morepositions of one or more reference sequences. For instance, for anygiven read, it may match any number of positions in the referencegenome, such as at 1 location or 16, or 32, or 64, or 100, or 500, or1,000 or more locations where a given read maps to in the genome.However, any individual read was derived, e.g., sequenced, from only onespecific portion of the genome. Hence, in order to find the truelocation from where a given particular read was derived, an alignmentfunction may be performed, e.g., a Smith-Waterman gapped or gaplessalignment, a Needleman-Wunsch alignment, etc., so as to determine wherein the genome one or more of the reads was actually derived, such as bycomparing all of the possible locations where a match occurs anddetermining which of all the possibilities is the most likely locationin the genome from which the read was sequenced, on the basis of whichlocation's alignment score is greatest.

As indicated, typically, an algorithm is used to perform such analignment function. For example, a Smith-Waterman and/or aNeedleman-Wunsch alignment algorithm may be employed to align two ormore sequences against one another. In this instance, they may beemployed in a manner so as to determine the probabilities that for anygiven position where the read maps to the reference genome that themapping is in fact the position from where the read originated.Typically these algorithms are configured so as to be performed bysoftware, however, in various instances, such as herein presented, oneor more of these algorithms can be configured so as to be executed inhardware, as described in greater detail herein below.

In particular, the alignment function operates, at least in part, toalign one or more, e.g., all, of the reads to the reference genomedespite the presence of one or more portions of mismatches, e.g., SNPs,insertions, deletions, structural artifacts, etc. so as to determinewhere the reads are likely to fit in the genome correctly. For instance,the one or more reads are compared against the reference genome, and thebest possible fit for the read against the genome is determined, whileaccounting for substitutions and/or indels and/or structural variants.However, to better determine which of the modified versions of the readbest fits against the reference genome, the proposed changes must beaccounted for, and as such a scoring function may also be performed.

For example, a scoring function may be performed, e.g., as part of anoverall alignment function, whereby as the alignment module performs itsfunction and introduces one or more changes into a sequence beingcompared to another, e.g., so as to achieve a better or best fit betweenthe two, for each change that is made so as to achieve the betteralignment, a number is detracted from a starting score, e.g., either aperfect score, or a zero starting score, in a manner such that as thealignment is performed the score for the alignment is also determined,such as where matches are detected the score is increased, and for eachchange introduced a penalty is incurred, and thus, the best fit for thepossible alignments can be determined, for example, by figuring outwhich of all the possible modified reads fits to the genome with thehighest score. Accordingly, in various instances, the alignment functionmay be configured to determine the best combination of changes that needto be made to the read(s) to achieve the highest scoring alignment,which alignment may then be determined to be the correct or most likelyalignment.

In view of the above, there are, therefore, at least two goals that maybe achieved from performing an alignment function. One is a report ofthe best alignment, including position in the reference genome and adescription of what changes are necessary to make the read match thereference segment at that position, and the other is the alignmentquality score. For instance, in various instances, the output from a thealignment module may be a Compact Idiosyncratic Gapped Alignment Report,e.g., a CIGAR string, wherein the CIGAR string output is a reportdetailing all the changes that were made to the reads so as to achievetheir best fit alignment, e.g., detailed alignment instructionsindicating how the query actually aligns with the reference. Such aCIGAR string readout may be useful in further stages of processing so asto better determine that for the given subject's genomic nucleotidesequence, the predicted variations as compared against a referencegenome are in fact true variations, and not just due to machine,software, or human error.

As set forth above, in various embodiments, alignment is typicallyperformed in a sequential manner, wherein the algorithm and/or firmwarereceives read sequence data, such as from a mapping module, pertainingto a read and one or more possible locations where the read maypotentially map to the one or more reference genomes, and furtherreceives genomic sequence data, such as from one or more memories, suchas associated DRAMs, pertaining to the one or more positions in the oneor more reference genomes to which the read may map. In particular, invarious embodiments, the mapping module processes the reads, such asfrom a FASTQ file, and maps each of them to one or more positions in thereference genome to where they may possibly align. The aligner thentakes these predicted positions and uses them to align the reads to thereference genome, such as by building a virtual array by which the readscan be compared with the reference genome.

In performing this function the aligner evaluates each mapped positionfor each individual read and particularly evaluates those reads that mapto multiple possible locations in the reference genome and scores thepossibility that each position is the correct position. It then comparesthe best scores, e.g., the two best scores, and makes a decision as towhere the particular read actually aligns. For instance, in comparingthe first and second best alignment scores, the aligner looks at thedifference between the scores, and if the difference between them isgreat, then the confidence score that the one with the bigger score iscorrect will be high. However, where the difference between them issmall, e.g., zero, then the confidence score in being able to tell fromwhich of the two positions the read actually is derived is low, and moreprocessing may be useful in being able to clearly determine the truelocation in the reference genome from where the read is derived.

Hence, the aligner in part is looking for the biggest difference betweenthe first and second best confidence scores in making its call that agiven read maps to a given location in the reference genome. Ideally,the score of the best possible choice of alignment is significantlygreater than the score for the second best alignment for that sequence.There are many different ways an alignment scoring methodology may beimplemented, for instance, each cell of the array may be scored or asub-portion of cells may be scored, such as in accordance with themethods disclosed herein. In various instances, scoring parameters fornucleotide matches, nucleotide mismatches, insertions, and deletions mayhave any various positive or negative or zero values. In variousinstances, these scoring parameters may be modified based on availableinformation. For instance, accurate alignments may be achieved by makingscoring parameters, including any or all of nucleotide match scores,nucleotide mismatch scores, gap (insertion and/or deletion) penalties,gap open penalties, and/or gap extend penalties, vary according to abase quality score associated with the current read nucleotide orposition. For example, score bonuses and/or penalties could be madesmaller when a base quality score indicates a high probability asequencing or other error being present. Base quality sensitive scoringmay be implemented, for example, using a fixed or configurablelookup-table, accessed using a base quality score, which returnscorresponding scoring parameters.

In a hardware implementation in an integrated circuit, such as an FPGAor ASIC, a scoring wave front may be implemented as a linear array ofscoring cells, such as 16 cells, or 32 cells, or 64 cells, or 128 cellsor the like. Each of the scoring cells may be built of digital logicelements in a wired configuration to compute alignment scores. Hence,for each step of the wave front, for instance, each clock cycle, or someother fixed or variable unit of time, each of the scoring cells, or aportion of the cells, computes the score or scores required for a newcell in the virtual alignment matrix. Notionally, the various scoringcells are considered to be in various positions in the alignment matrix,corresponding to a scoring wave front as discussed herein, e.g., along astraight line extending from bottom-left to top-right in the matrix. Asis well understood in the field of digital logic design, the physicalscoring cells and their comprised digital logic need not be physicallyarranged in like manner on the integrated circuit.

Accordingly, as the wave front takes steps to sweep through the virtualalignment matrix, the notional positions of the scoring cellscorrespondingly update each cell, for example, notionally “moving” astep to the right, or for example, a step downward in the alignmentmatrix. All scoring cells make the same relative notional movement,keeping the diagonal wave front arrangement intact. Each time the wavefront moves to a new position, e.g., with a vertical downward step, or ahorizontal rightward step in the matrix, the scoring cells arrive in newnotional positions, and compute alignment scores for the virtualalignment matrix cells they have entered. In such an implementation,neighboring scoring cells in the linear array are coupled to communicatequery (read) nucleotides, reference nucleotides, and previouslycalculated alignment scores. The nucleotides of the reference window maybe fed sequentially into one end of the wave front, e.g., the top-rightscoring cell in the linear array, and may shift from there sequentiallydown the length of the wave front, so that at any given time, a segmentof reference nucleotides equal in length to the number of scoring cellsis present within the cells, one successive nucleotide in eachsuccessive scoring cell.

For instance, each time the wave front steps horizontally, anotherreference nucleotide is fed into the top-right cell, and other referencenucleotides shift down-left through the wave front. This shifting ofreference nucleotides may be the underlying reality of the notionalmovement of the wave front of scoring cells rightward through thealignment matrix. Hence, the nucleotides of the read may be fedsequentially into the opposite end of the wave front, e.g. thebottom-left scoring cell in the linear array, and shift from theresequentially up the length of the wave front, so that at any given time,a segment of query nucleotides equal in length to the number of scoringcells is present within the cells, one successive nucleotide in eachsuccessive scoring cell. Likewise, each time the wave front stepsvertically, another query nucleotide is fed into the bottom-left cell,and other query nucleotides shift up-right through the wave front. Thisshifting of query nucleotides is the underlying reality of the notionalmovement of the wave front of scoring cells downward through thealignment matrix. Accordingly, by commanding a shift of referencenucleotides, the wave front may be moved a step horizontally, and bycommanding a shift of query nucleotides, the wave front may be moved astep vertically. Hence, to produce generally diagonal wave frontmovement, such as to follow a typical alignment of query and referencesequences without insertions or deletions, wave front steps may becommanded in alternating vertical and horizontal directions.

Accordingly, neighboring scoring cells in the linear array may becoupled to communicate previously calculated alignment scores. Invarious alignment scoring algorithms, such as a Smith-Waterman orNeedleman-Wunsch, or such variant, the alignment score(s) in each cellof the virtual alignment matrix may be calculated using previouslycalculated scores in other cells of the matrix, such as the three cellspositioned immediately to the left of the current cell, above thecurrent cell, and diagonally up-left of the current cell. When a scoringcell calculates new score(s) for another matrix position it has entered,it must retrieve such previously calculated scores corresponding to suchother matrix positions. These previously calculated scores may beobtained from storage of previously calculated scores within the samecell, and/or from storage of previously calculated scores in the one ortwo neighboring scoring cells in the linear array. This is because thethree contributing score positions in the virtual alignment matrix(immediately left, above, and diagonally up-left) would have been scoredeither by the current scoring cell, or by one of its neighboring scoringcells in the linear array.

For instance, the cell immediately to the left in the matrix would havebeen scored by the current scoring cell, if the most recent wave frontstep was horizontal (rightward), or would have been scored by theneighboring cell down-left in the linear array, if the most recent wavefront step was vertical (downward). Similarly, the cell immediatelyabove in the matrix would have been scored by the current scoring cell,if the most recent wave front step was vertical (downward), or wouldhave been scored by the neighboring cell up-right in the linear array,if the most recent wave front step was horizontal (rightward).Particularly, the cell diagonally up-left in the matrix would have beenscored by the current scoring cell, if the most recent two wave frontsteps were in different directions, e.g., down then right, or right thendown, or would have been scored by the neighboring cell up-right in thelinear array, if the most recent two wave front steps were bothhorizontal (rightward), or would have been scored by the neighboringcell down-left in the linear array, if the most recent two wave frontsteps were both vertical (downward).

Accordingly, by considering information on the last one or two wavefront step directions, a scoring cell may select the appropriatepreviously calculated scores, accessing them within itself, and/orwithin neighboring scoring cells, utilizing the coupling betweenneighboring cells. In a variation, scoring cells at the two ends of thewave front may have their outward score inputs hard-wired to invalid, orzero, or minimum-value scores, so that they will not affect new scorecalculations in these extreme cells. A wave front being thus implementedin a linear array of scoring cells, with such coupling for shiftingreference and query nucleotides through the array in opposingdirections, in order to notionally move the wave front in vertical andhorizontal, e.g., diagonal, steps, and coupling for accessing scorespreviously computed by neighboring cells in order to compute alignmentscore(s) in new virtual matrix cell positions entered by the wave front,it is accordingly possible to score a band of cells in the virtualmatrix, the width of the wave front, such as by commanding successivesteps of the wave front to sweep it through the matrix.

For a new read and reference window to be aligned, therefore, the wavefront may begin positioned inside the scoring matrix, or,advantageously, may gradually enter the scoring matrix from outside,beginning e.g., to the left, or above, or diagonally left and above thetop-left corner of the matrix. For instance, the wave front may beginwith its top-left scoring cell positioned just left of the top-left cellof the virtual matrix, and the wave front may then sweep rightward intothe matrix by a series of horizontal steps, scoring a horizontal band ofcells in the top-left region of the matrix. When the wave front reachesa predicted alignment relationship between the reference and query, orwhen matching is detected from increasing alignment scores, the wavefront may begin to sweep diagonally down-right, by alternating verticaland horizontal steps, scoring a diagonal band of cells through themiddle of the matrix. When the bottom-left wave front scoring cellreaches the bottom of the alignment matrix, the wave front may beginsweeping rightward again by successive horizontal steps, until some orall wave front cells sweep out of the boundaries of the alignmentmatrix, scoring a horizontal band of cells in the bottom-right region ofthe matrix.

One or more of such alignment procedures may be performed by anysuitable alignment algorithm, such as a Needleman-Wunsch alignmentalgorithm and/or a Smith-Waterman alignment algorithm that may have beenmodified to accommodate the functionality herein described. In generalboth of these algorithms and those like them basically perform, in someinstances, in a similar manner. For instance, as set forth above, thesealignment algorithms typically build the virtual array in a similarmanner such that, in various instances, the horizontal top boundary maybe configured to represent the genomic reference sequence, which may belaid out across the top row of the array according to its base paircomposition. Likewise, the vertical boundary may be configured torepresent the sequenced and mapped query sequences that have beenpositioned in order, downwards along the first column, such that theirnucleotide sequence order is generally matched to the nucleotidesequence of the reference to which they mapped. The intervening cellsmay then be populated with scores as to the probability that therelevant base of the query at a given position, is positioned at thatlocation relative to the reference. In performing this function, a swathmay be moved diagonally across the matrix populating scores within theintervening cells and the probability for each base of the query beingin the indicated position may be determined.

With respect to a Needleman-Wunsch alignment function, which generatesoptimal global (or semi-global) alignments, aligning the entire readsequence to some segment of the reference genome, the wave frontsteering may be configured such that it typically sweeps all the wayfrom the top edge of the alignment matrix to the bottom edge. When thewave front sweep is complete, the maximum score on the bottom edge ofthe alignment matrix (corresponding to the end of the read) is selected,and the alignment is backtraced to a cell on the top edge of the matrix(corresponding to the beginning of the read). In various of theinstances disclosed herein, the reads can be any length long, can be anysize, and there need not be extensive read parameters as to how thealignment is performed, e.g., in various instances, the read can be aslong as a chromosome. In such an instance, however, the memory size andchromosome length may be limiting factor.

With respect to a Smith-Waterman algorithm, which generates optimallocal alignments, aligning the entire read sequence or part of the readsequence to some segment of the reference genome, this algorithm may beconfigured for finding the best scoring possible based on a full orpartial alignment of the read. Hence, in various instances, the wavefront-scored band may not extend to the top and/or bottom edges of thealignment matrix, such as if a very long read had only seeds in itsmiddle mapping to the reference genome, but commonly the wave front maystill score from top to bottom of the matrix. Local alignment istypically achieved by two adjustments. First, alignment scores are neverallowed to fall below zero (or some other floor), and if a cell scoreotherwise calculated would be negative, a zero score is substituted,representing the start of a new alignment. Second, the maximum alignmentscore produced in any cell in the matrix, not necessarily along thebottom edge, is used as the terminus of the alignment. The alignment isbacktraced from this maximum score up and left through the matrix to azero score, which is used as the start position of the local alignment,even if it is not on the top row of the matrix.

In view of the above, there are several different possible pathwaysthrough the virtual array. In various embodiments, the wave front startsfrom the upper left corner of the virtual array, and moves downwardstowards identifiers of the maximum score. For instance, the results ofall possible aligns can be gathered, processed, correlated, and scoredto determine the maximum score. When the end of a boundary or the end ofthe array has been reached and/or a computation leading to the highestscore for all of the processed cells is determined (e.g., the overallhighest score identified) then a backtrace may be performed so as tofind the pathway that was taken to achieve that highest score. Forexample, a pathway that leads to a predicted maximum score may beidentified, and once identified an audit may be performed so as todetermine how that maximum score was derived, for instance, by movingbackwards following the best score alignment arrows retracing thepathway that led to achieving the identified maximum score, such ascalculated by the wave front scoring cells.

This backwards reconstruction or backtrace involves starting from adetermined maximum score, and working backward through the previouscells navigating the path of cells having the scores that led toachieving the maximum score all the way up the table and back to aninitial boundary, such as the beginning of the array, or a zero score inthe case of local alignment. During a backtrace, having reached aparticular cell in the alignment matrix, the next backtrace step is tothe neighboring cell, immediately leftward, or above, or diagonallyup-left, which contributed the best score that was selected to constructthe score in the current cell. In this manner, the evolution of themaximum score may be determined, thereby figuring out how the maximumscore was achieved. The backtrace may end at a corner, or an edge, or aboundary, or may end at a zero score, such as in the upper left handcorner of the array. Accordingly, it is such a back trace thatidentifies the proper alignment and thereby produces the CIGAR strandreadout that represents how the sample genomic sequence derived from theindividual, or a portion thereof, matches to, or otherwise aligns with,the genomic sequence of the reference DNA.

Once it has been determined where each read is mapped, and furtherdetermined where each read is aligned, e.g., each relevant read has beengiven a position and a quality score reflecting the probability that theposition is the correct alignment, such that the nucleotide sequence forthe subject's DNA is known, then the order of the various reads and/orgenomic nucleic acid sequence of the subject may be verified, such as byperforming a back trace function moving backwards up through the arrayso as to determine the identity of every nucleic acid in its properorder in the sample genomic sequence. Consequently, in some aspects, thepresent disclosure is directed to a back trace function, such as is partof an alignment module that performs both an alignment and a back tracefunction, such as a module that may be part of a pipeline of modules,such as a pipeline that is directed at taking raw sequence read data,such as form a genomic sample form an individual, and mapping and/oraligning that data, which data may then be sorted.

To facilitate the backtrace operation, it is useful to store a scoringvector for each scored cell in the alignment matrix, encoding thescore-selection decision. For classical Smith-Waterman and/orNeedleman-Wunsch scoring implementations with linear gap penalties, thescoring vector can encode four possibilities, which may optionally bestored as a 2-bit integer from 0 to 3, for example: 0=new alignment(null score selected); 1=vertical alignment (score from the cell aboveselected, modified by gap penalty); 2=horizontal alignment (score fromthe cell to the left selected, modified by gap penalty); 3=diagonalalignment (score from the cell up and left selected, modified bynucleotide match or mismatch score). Optionally, the computed score(s)for each scored matrix cell may also be stored (in addition to themaximum achieved alignment score which is standardly stored), but thisis not generally necessary for backtrace, and can consume large amountsof memory. Performing backtrace then becomes a matter of following thescoring vectors; when the backtrace has reached a given cell in thematrix, the next backtrace step is determined by the stored scoringvector for that cell, e.g.: 0=terminate backtrace; 1=backtrace upward;2=backtrace leftward; 3=backtrace diagonally up-left.

Such scoring vectors may be stored in a two-dimensional table arrangedaccording to the dimensions of the alignment matrix, wherein onlyentries corresponding to cells scored by the wave front are populated.Alternatively, to conserve memory, more easily record scoring vectors asthey are generated, and more easily accommodate alignment matrices ofvarious sizes, scoring vectors may be stored in a table with each rowsized to store scoring vectors from a single wave front of scoringcells, e.g. 128 bits to store 64 2-bit scoring vectors from a 64-cellwave front, and a number of rows equal to the maximum number of wavefront steps in an alignment operation. Additionally, for this option, arecord may be kept of the directions of the various wavefront steps,e.g., storing an extra, e.g., 129.sup.th, bit in each table row,encoding e.g., 0 for vertical wavefront step preceding this wavefrontposition, and 1 for horizontal wavefront step preceding this wavefrontposition. This extra bit can be used during backtrace to keep track ofwhich virtual scoring matrix positions the scoring vectors in each tablerow correspond to, so that the proper scoring vector can be retrievedafter each successive backtrace step. When a backtrace step is verticalor horizontal, the next scoring vector should be retrieved from theprevious table row, but when a backtrace step is diagonal, the nextscoring vector should be retrieved from two rows previous, because thewavefront had to take two steps to move from scoring any one cell toscoring the cell diagonally right-down from it.

In the case of affine gap scoring, scoring vector information may beextended, e.g. to 4 bits per scored cell. In addition to the e.g., 2-bitscore-choice direction indicator, two 1-bit flags may be added, avertical extend flag, and a horizontal extend flag. According to themethods of affine gap scoring extensions to Smith-Waterman orNeedleman-Wunsch or similar alignment algorithms, for each cell, inaddition to the primary alignment score representing the best-scoringalignment terminating in that cell, a ‘vertical score’ should begenerated, corresponding to the maximum alignment score reaching thatcell with a final vertical step, and a ‘horizontal score’ should begenerated, corresponding to the maximum alignment score reaching thatcell with a final horizontal step; and when computing any of the threescores, a vertical step into the cell may be computed either using theprimary score from the cell above minus a gap-open penalty, or using thevertical score from the cell above minus a gap-extend penalty, whicheveris greater; and a horizontal step into the cell may be computed eitherusing the primary score from the cell to the left minus a gap-openpenalty, or using the horizontal score from the cell to the left minus agap-extend penalty, whichever is greater. In cases where the verticalscore minus a gap extend penalty is selected, the vertical extend flagin the scoring vector should be set, e.g. ‘1’, and otherwise it shouldbe unset, e.g. ‘0’.

In cases when the horizontal score minus a gap extend penalty isselected, the horizontal extend flag in the scoring vector should beset, e.g. ‘1’, and otherwise it should be unset, e.g. ‘0’. Duringbacktrace for affine gap scoring, any time backtrace takes a verticalstep upward from a given cell, if that cell's scoring vector's verticalextend flag is set, the following backtrace step must also be vertical,regardless of the scoring vector for the cell above. Likewise, any timebacktrace takes a horizontal step leftward from a given cell, if thatcell's scoring vector's horizontal extend flag is set, the followingbacktrace step must also be horizontal, regardless of the scoring vectorfor the cell to the left. Accordingly, such a table of scoring vectors,e.g. 129 bits per row for 64 cells using linear gap scoring, or 257 bitsper row for 64 cells using affine gap scoring, with some number NR ofrows, is adequate to support backtrace after concluding alignmentscoring where the scoring wavefront took NR steps or fewer.

For example, when aligning 300-nucleotide reads, the number of wavefrontsteps required may always be less than 1024, so the table may be257.times.1024 bits, or approximately 32 kilobytes, which in many casesmay be a reasonable local memory inside the integrated circuit. But ifvery long reads are to be aligned, e.g. 100,000 nucleotides, the memoryrequirements for scoring vectors may be quite large, e.g. 8 megabytes,which may be very costly to include as local memory inside theintegrated circuit. For such support, scoring vector information may berecorded to bulk memory outside the integrated circuit, e.g. DRAM, butthen the bandwidth requirements, e.g. 257 bits per clock cycle peraligner module, may be excessive, which may bottleneck and dramaticallyreduce aligner performance. Accordingly, it is desirable to have amethod for disposing of scoring vectors before completing alignment, sotheir storage requirements can be kept bounded, e.g. to performincremental backtraces, generating incremental partial CIGAR strings forexample, from early portions of an alignment's scoring vector history,so that such early portions of the scoring vectors may then bediscarded. The challenge is that the backtrace is supposed to begin inthe alignment's terminal, maximum scoring cell, which unknown until thealignment scoring completes, so any backtrace begun before alignmentcompletes may begin from the wrong cell, not along the eventual finaloptimal alignment path.

Hence, a method is given for performing incremental backtrace frompartial alignment information, e.g., comprising partial scoring vectorinformation for alignment matrix cells scored so far. From a currentlycompleted alignment boundary, e.g., a particular scored wave frontposition, backtrace is initiated from all cell positions on theboundary. Such backtrace from all boundary cells may be performedsequentially, or advantageously, especially in a hardwareimplementation, all the backtraces may be performed together. It is notnecessary to extract alignment notations, e.g., CIGAR strings, fromthese multiple backtraces; only to determine what alignment matrixpositions they pass through during the backtrace. In an implementationof simultaneous backtrace from a scoring boundary, a number of 1-bitregisters may be utilized, corresponding to the number of alignmentcells, initialized e.g., all to ‘1’s, representing whether any of thebacktraces pass through a corresponding position. For each step ofsimultaneous backtrace, scoring vectors corresponding to all the current‘1’s in these registers, e.g. from one row of the scoring vector table,can be examined, to determine a next backtrace step corresponding toeach ‘1’ in the registers, leading to a following position for each ‘1’in the registers, for the next simultaneous backtrace step.

Importantly, it is easily possible for multiple ‘1’s in the registers tomerge into common positions, corresponding to multiple of thesimultaneous backtraces merging together onto common backtrace paths.Once two or more of the simultaneous backtraces merge together, theyremain merged indefinitely, because henceforth they will utilize scoringvector information from the same cell. It has been observed, empiricallyand for theoretical reasons, that with high probability, all of thesimultaneous backtraces merge into a singular backtrace path, in arelatively small number of backtrace steps, which e.g. may be a smallmultiple, e.g. 8, times the number of scoring cells in the wavefront.For example, with a 64-cell wavefront, with high probability, allbacktraces from a given wavefront boundary merge into a single backtracepath within 512 backtrace steps. Alternatively, it is also possible, andnot uncommon, for all backtraces to terminate within the number, e.g.512, of backtrace steps.

Accordingly, the multiple simultaneous backtraces may be performed froma scoring boundary, e.g. a scored wavefront position, far enough backthat they all either terminate or merge into a single backtrace path,e.g. in 512 backtrace steps or fewer. If they all merge together into asingular backtrace path, then from the location in the scoring matrixwhere they merge, or any distance further back along the singularbacktrace path, an incremental backtrace from partial alignmentinformation is possible. Further backtrace from the merge point, or anydistance further back, is commenced, by normal singular backtracemethods, including recording the corresponding alignment notation, e.g.,a partial CIGAR string. This incremental backtrace, and e.g., partialCIGAR string, must be part of any possible final backtrace, and e.g.,full CIGAR string, that would result after alignment completes, unlesssuch final backtrace would terminate before reaching the scoringboundary where simultaneous backtrace began, because if it reaches thescoring boundary, it must follow one of the simultaneous backtracepaths, and merge into the singular backtrace path, now incrementallyextracted.

Therefore, all scoring vectors for the matrix regions corresponding tothe incrementally extracted backtrace, e.g., in all table rows for wavefront positions preceding the start of the extracted singular backtrace,may be safely discarded. When the final backtrace is performed from amaximum scoring cell, if it terminates before reaching the scoringboundary (or alternatively, if it terminates before reaching the startof the extracted singular backtrace), the incremental alignmentnotation, e.g. partial CIGAR string, may be discarded. If the finalbacktrace continues to the start of the extracted singular backtrace,its alignment notation, e.g., CIGAR string, may then be grafted onto theincremental alignment notation, e.g., partial CIGAR string. Furthermore,in a very long alignment, the process of performing a simultaneousbacktrace from a scoring boundary, e.g., scored wave front position,until all backtraces terminate or merge, followed by a singularbacktrace with alignment notation extraction, may be repeated multipletimes, from various successive scoring boundaries. The incrementalalignment notation, e.g. partial CIGAR string, from each successiveincremental backtrace may then be grafted onto the accumulated previousalignment notations, unless the new simultaneous backtrace or singularbacktrace terminates early, in which case accumulated previous alignmentnotations may be discarded. The eventual final backtrace likewise graftsits alignment notation onto the most recent accumulated alignmentnotations, for a complete backtrace description, e.g. CIGAR string.

Accordingly, in this manner, the memory to store scoring vectors may bekept bounded, assuming simultaneous backtraces always merge together ina bounded number of steps, e.g. 512 steps. In rare cases wheresimultaneous backtraces fail to merge or terminate in the bounded numberof steps, various exceptional actions may be taken, including failingthe current alignment, or repeating it with a higher bound or with nobound, perhaps by a different or traditional method, such as storing allscoring vectors for the complete alignment, such as in external DRAM. Ina variation, it may be reasonable to fail such an alignment, because itis extremely rare, and even rarer that such a failed alignment wouldhave been a best-scoring alignment to be used in alignment reporting.

In an optional variation, scoring vector storage may be divided,physically or logically, into a number of distinct blocks, e.g. 512 rowseach, and the final row in each block may be used as a scoring boundaryto commence a simultaneous backtrace. Optionally, a simultaneousbacktrace may be required to terminate or merge within the single block,e.g. 512 steps. Optionally, if simultaneous backtraces merge in fewersteps, the merged backtrace may nevertheless be continued through thewhole block, before commencing an extraction of a singular backtrace inthe previous block. Accordingly, after scoring vectors are fully writtento block N, and begin writing to block N+1, a simultaneous backtrace maycommence in block N, followed by a singular backtrace and alignmentnotation extraction in block N−1. If the speed of the simultaneousbacktrace, the singular backtrace, and alignment scoring are all similaror identical, and can be performed simultaneously, e.g., in parallelhardware in an integrated circuit, then the singular backtrace in blockN−1 may be simultaneous with scoring vectors filling block N+2, and whenblock N+3 is to be filled, block N−1 may be released and recycled.

Thus, in such an implementation, a minimum of 4 scoring vector blocksmay be employed, and may be utilized cyclically. Hence, the totalscoring vector storage for an aligner module may be 4 blocks of 257×512bits each, for example, or approximately 64 kilobytes. In a variation,if the current maximum alignment score corresponds to an earlier blockthan the current wavefront position, this block and the previous blockmay be preserved rather than recycled, so that a final backtrace maycommence from this position if it remains the maximum score; having anextra 2 blocks to keep preserved in this manner brings the minimum,e.g., to 6 blocks.

In another variation, to support overlapped alignments, the scoring wavefront crossing gradually from one alignment matrix to the next asdescribed above, additional blocks, e.g. 1 or 2 additional blocks, maybe utilized, e.g., 8 blocks total, e.g., approximately 128 kilobytes.Accordingly, if such a limited number of blocks, e.g., 4 blocks or 8blocks, is used cyclically, alignment and backtrace of arbitrarily longreads is possible, e.g., 100,000 nucleotides, or an entire chromosome,without the use of external memory for scoring vectors. It is to beunderstood, such as with reference to the above, that although a mappingfunction may in some instances have been described, such as withreference to a mapper, and/or an alignment function may have in someinstances been described, such as with reference to an aligner, thesedifferent functions may be performed sequentially by the samearchitecture, which has commonly been referenced in the art as analigner. Accordingly, in various instances, both the mapping functionand the aligning function, as herein described may be performed by acommon architecture that may be understood to be an aligner, especiallyin those instances wherein to perform an alignment function, a mappingfunction need first be performed.

In various instances, the devices, systems, and their methods of use ofthe present disclosure may be configured for performing one or more of afull-read gapless and/or gapped alignments that may then be scored so asto determine the appropriate alignment for the reads in the dataset. Forinstance, in various instances, a gapless alignment procedure may beperformed on data to be processed, which gapless alignment procedure maythen be followed by one or more of a gapped alignment, and/or by aselective Smith-Waterman alignment procedure. For instance, in a firststep, a gapless alignment chain may be generated. As described herein,such gapless alignment functions may be performed quickly, such aswithout the need for accounting for gaps, which after a first step ofperforming a gapless alignment, may then be followed by then performinga gapped alignment.

For example, an alignment function may be performed in order todetermine how any given nucleotide sequence, e.g., read, aligns to areference sequence without the need for inserting gaps in one or more ofthe reads and/or reference. An important part of performing such analignment function is determining where and how there are mismatches inthe sequence in question versus the sequence of the reference genome.However, because of the great homology within the human genome, intheory, any given nucleotide sequence is going to largely match arepresentative reference sequence. Where there are mismatches, thesewill likely be due to a single nucleotide polymorphism, which isrelatively easy to detect, or they will be due to an insertion ordeletion in the sequences in question, which are much more difficult todetect.

Consequently, in performing an alignment function, the majority of thetime, the sequence in question is going to match the reference sequence,and where there is a mismatch due to an SNP, this will easily bedetermined. Hence, a relatively large amount of processing power is notrequired to perform such analysis. Difficulties arise, however, wherethere are insertions or deletions in the sequence in question withrespect to the reference sequence, because such insertions and deletionsamount to gaps in the alignment. Such gaps require a more extensive andcomplicated processing platform so as to determine the correctalignment. Nevertheless, because there will only be a small percentageof indels, only a relatively smaller percentage of gapped alignmentprotocols need be performed as compared to the millions of gaplessalignments performed. Hence, only a small percentage of all of thegapless alignment functions result in a need for further processing dueto the presence of an indel in the sequence, and therefore will need agapped alignment.

When an indel is indicated in a gapless alignment procedure, only thosesequences get passed on to an alignment engine for further processing,such as an alignment engine configured for performing an advancedalignment function, such as a Smith Waterman alignment (SWA). Thus,because either a gapless or a gapped alignment is to be performed, thedevices and systems disclosed herein are a much more efficient use ofresources. More particularly, in certain embodiments, both a gapless anda gapped alignment may be performed on a given selection of sequences,e.g., one right after the other, then the results are compared for eachsequence, and the best result is chosen. Such an arrangement may beimplemented, for instance, where an enhancement in accuracy is desired,and an increased amount of time and resources for performing therequired processing is acceptable.

Particularly, in various instances, a first alignment step may beperformed without engaging a processing intensive Smith Watermanfunction. Hence, a plurality of gapless alignments may be performed in aless resource intensive, less time consuming manner, and because lessresources are needed less space need be dedicated for such processing onthe chip. Thus, more processing may be performed, using less processingelements, requiring less time, therefore, more alignments can be done,and better accuracy can be achieved. More particularly, less chipresource-implementations for performing Smith Waterman alignments needbe dedicated using less chip area, as it does not require as much chiparea for the processing elements required to perform gapless alignmentsas it does for performing a gapped alignment. As the chip resourcerequirements go down, the more processing can be performed in a shorterperiod of time, and with the more processing that can be performed, thebetter the accuracy can be achieved.

Accordingly, in such instances, a gapless alignment protocol, e.g., tobe performed by suitably configured gapless alignment resources, may beemployed. For example, as disclosed herein, in various embodiments, analignment processing engine is provided such as where the processingengine is configured for receiving digital signals, e.g., representingone or more reads of genomic data, such as digital data denoting one ormore nucleotide sequences, from an electronic data source, and mappingand/or aligning that data to a reference sequence, such as by firstperforming a gapless alignment function on that data, which gaplessalignment function may then be followed, if necessary, by a gappedalignment function, such as by performing a Smith Waterman alignmentprotocol.

Consequently, in various instances, a gapless alignment function isperformed on a contiguous portion of the read, e.g., employing a gaplessaligner, and if the gapless alignment goes from end to end, e.g., theread is complete, a gapped alignment is not performed. However, if theresults of the gapless alignment are indicative of their being an indelpresent, e.g., the read is clipped or otherwise incomplete, then agapped alignment may be performed. Thus, the ungapped alignment resultsmay be used to determine if a gapped alignment is needed, for instance,where the ungapped alignment is extended into a gap region but does notextend the entire length of the read, such as where the read may beclipped, e.g., soft clipped to some degree, and where clipped then agapped alignment may be performed.

Hence, in various embodiments, based on the completeness and alignmentscores, it is only if the gapless alignment ends up being clipped, e.g.,does not go end to end, that a gapped alignment is performed. Moreparticularly, in various embodiments, the best identifiable gaplessand/or gapped alignment score may be estimated and used as a cutoff linefor deciding if the score is good enough to warrant further analysis,such as by performing a gapped alignment. Thus, the completeness ofalignment, and its score, may be employed such that a high score isindicative of the alignment being complete, and therefore, ungapped, anda lower score is indicative of the alignment not being complete, and agapped alignment needing to be performed. Hence, where a high score isattained a gapped alignment is not performed, but only when the score islow enough is the gapped alignment performed. Of course, in variousinstances a brute force alignment approach may be employed such that thenumber of gapped and/or gapless aligners are deployed in the chiparchitecture, so as to allow for a greater number of alignments to beperformed, and thus a larger amount of data may be looked at.

More particularly, in various embodiments, each mapping and/or aligningengine may include one or more, e.g., two Smith-Waterman, alignermodules. In certain instances, these modules may be configured so as tosupport global (end-to-end) gapless alignment and/or local (clipped)gapped alignment, perform affine gap scoring, and can be configured forgenerating unclipped score bonuses at each end. Base-quality sensitivematch and mismatch scoring may also be supported. Where two alignmentmodules are included, e.g., as part of the integrated circuit, forexample, each Smith-Waterman aligner may be constructed as ananti-diagonal wavefront of scoring cells, which wavefront ‘moves’through a virtual alignment rectangle, scoring cells that it sweepsthrough.

However, for longer reads, the Smith-Waterman wavefront may also beconfigured to support automatic steering, so as to track the bestalignment through accumulated indels, such as to ensure that thealignment wavefront and cells being scored do not escape the scoringband. In the background, logic engines may be configured to examinecurrent wavefront scores, find the maximums, flag the subsets of cellsover a threshold distance below the maximum, and target the midpointbetween the two extreme flags. In such an instance, auto-steering may beconfigured to run diagonally when the target is at the wavefront center,but may be configured to run straight horizontally or vertically asneeded to re-center the target if it drifts, such as due to the presenceof indels.

The output from the alignment module is a SAM (Text) or BAM (e.g.,binary version of a SAM) file along with a mapping quality score (MAPA),which quality score reflects the confidence that the predicted andaligned location of the read to the reference is actually where the readis derived. Accordingly, once it has been determined where each read ismapped, and further determined where each read is aligned, e.g., eachrelevant read has been given a position and a quality score reflectingthe probability that the position is the correct alignment, such thatthe nucleotide sequence for the subject's DNA is known as well as howthe subject's DNA differs from that of the reference (e.g., the CIGARstring has been determined), then the various reads representing thegenomic nucleic acid sequence of the subject may be sorted by chromosomelocation, so that the exact location of the read on the chromosomes maybe determined. Consequently, in some aspects, the present disclosure isdirected to a sorting function, such as may be performed by a sortingmodule, which sorting module may be part of a pipeline of modules, suchas a pipeline that is directed at taking raw sequence read data, such asform a genomic sample form an individual, and mapping and/or aligningthat data, which data may then be sorted.

More particularly, once the reads have been assigned a position, such asrelative to the reference genome, which may include identifying to whichchromosome the read belongs and/or its offset from the beginning of thatchromosome, the reads may be sorted by position. Sorting may be useful,such as in downstream analyses, whereby all of the reads that overlap agiven position in the genome may be formed into a pile up so as to beadjacent to one another, such as after being processed through thesorting module, whereby it can be readily determined if the majority ofthe reads agree with the reference value or not. Hence, where themajority of reads do not agree with the reference value a variant callcan be flagged. Sorting, therefore, may involve one or more of sortingthe reads that align to the relatively same position, such as the samechromosome position, so as to produce a pileup, such that all the readsthat cover the same location are physically grouped together; and mayfurther involve analyzing the reads of the pileup to determine where thereads may indicate an actual variant in the genome, as compared to thereference genome, which variant may be distinguishable, such as by theconsensus of the pileup, from an error, such as a machine read error orerror an error in the sequencing methods which may be exhibited by asmall minority of the reads.

Once the data has been obtained there are one or more other modules thatmay be run so as to clean up the data. For instance, one module that maybe included, for example, in a sequence analysis pipeline, such as fordetermining the genomic sequence of an individual, may be a localrealignment module. For example, it is often difficult to determineinsertions and deletions that occur at the end of the read. This isbecause the Smith-Waterman or equivalent alignment process lacks enoughcontext beyond the indel to allow the scoring to detect its presence.Consequently, the actual indel may be reported as one or more SNPs. Insuch an instance, the accuracy of the predicted location for any givenread may be enhanced by performing a local realignment on the mappedand/or aligned and/or sorted read data.

In such instances, pileups may be used to help clarify the properalignment, such as where a position in question is at the end of anygiven read, that same position is likely to be at the middle of someother read in the pileup. Accordingly, in performing a local realignmentthe various reads in a pileup may be analyzed so as to determine if someof the reads in the pile up indicate that there was an insertion or adeletion at a given position where an other read does not include theindel, or rather includes a substitution, at that position, then theindel may be inserted, such as into the reference, where it is notpresent, and the reads in the local pileup that overlap that region maybe realigned to see if collectively a better score is achieved then whenthe insertion and/or deletion was not there. If there is an improvement,the whole set of reads in the pileup may be reviewed and if the score ofthe overall set has improved then it is clear to make the call thatthere really was an indel at that position. In a manner such as this,the fact that there is not enough context to more accurately align aread at the end of a chromosome, for any individual read, may becompensated for. Hence, when performing a local realignment, one or morepileups where one or more indels may be positioned are examined, and itis determined if by adding an indel at any given position the overallalignment score may be enhanced.

Another module that may be included, for example, in a sequence analysispipeline, such as for determining the genomic sequence of an individual,may be a duplicate marking module. For instance, a duplicate markingfunction may be performed so as to compensate for chemistry errors thatmay occur during the sequencing phase. For example, as described above,during some sequencing procedures nucleic acid sequences are attached tobeads and built up from there using labeled nucleotide bases. Ideallythere will be only one read per bead. However, sometimes multiple readsbecome attached to a single bead and this results in an excessive numberof copies of the attached read. This phenomenon is known as readduplication.

After an alignment is performed and the results obtained, and/or asorting function, local realignment, and/or a de-duplication isperformed, a variant call function may be employed on the resultantdata. For instance, a typical variant call function or parts thereof maybe configured so as to be implemented in a software and/or hardwiredconfiguration, such as on an integrated circuit. Particularly, variantcalling is a process that involves positioning all the reads that alignto a given location on the reference into groupings such that alloverlapping regions from all the various aligned reads form a “pile up.”Then the pileup of reads covering a given region of the reference genomeare analyzed to determine what the most likely actual content of thesampled individual's DNA/RNA is within that region. This is thenrepeated, step wise, for every region of the genome. The determinedcontent generates a list of differences termed “variations” or“variants” from the reference genome, each with an associated confidencelevel along with other metadata.

The most common variants are single nucleotide polymorphisms (SNPs), inwhich a single base differs from the reference. SNPs occur at about 1 in1000 positions in a human genome. Next most common are insertions (intothe reference) and deletions (from the reference), or “indels”collectively. These are more common at shorter lengths, but can be ofany length. Additional complications arise, however, because thecollection of sequenced segments (“reads”) is random, some regions willhave deeper coverage than others. There are also more complex variantsthat include multi-base substitutions, and combinations of indels andsubstitutions that can be thought of as length-altering substitutions.Standard software based variant callers have difficulty identifying allof these, and with various limits on variant lengths. More specializedvariant callers in both software and/or hardware are needed to identifylonger variations, and many varieties of exotic “structural variants”involving large alterations of the chromosomes.

However, variant calling is a difficult procedure to implement insoftware, and worlds of magnitude more difficult to deploy in hardware.In order to account for and/or detect these types of errors, typicalvariant callers may perform one or more of the following tasks. Forinstance, they may come up with a set of hypothesis genotypes (contentof the one or two chromosomes at a locus), use Bayesian calculations toestimate the posterior probability that each genotype is the truth giventhe observed evidence, and report the most likely genotype along withits confidence level. As such variant callers may be simple or complex.Simpler variant callers look only at the column of bases in the alignedread pileup at the precise position of a call being made. More advancedvariant callers are “haplotype based callers”, which may be configuredto take into account context, such as in a window, around the call beingmade.

A “haplotype” is particular DNA content (nucleotide sequence, list ofvariants, etc.) in a single common “strand”, e.g. one of two diploidstrands in a region, and a haplotype based caller considers the Bayesianimplications of which differences are linked by appearing in the sameread. Accordingly, a variant call protocol, as proposed herein, mayimplement one or more improved functions such as those performed in aGenome Analysis Tool Kit (GATK) haplotype caller and/or using a HiddenMarkov Model (HMM) tool and/or a De Bruijn Graph function, such as whereone or more these functions typically employed by a GATK haplotypecaller, and/or a HMM tool, and/or a De Bruijn Graph function may beimplemented in software and/or in hardware.

More particularly, as implemented herein, various different variant calloperations may be configured so as to be performed in software orhardware, and may include one or more of the following steps. Forinstance, variant call function may include an active regionidentification, such as for identifying places where multiple readsdisagree with the reference, and for generating a window around theidentified active region, so that only these regions may be selected forfurther processing. Additionally, localized haplotype assembly may takeplace, such as where, for each given active region, all the overlappingreads may be assembled into a “De Bruijn graph” (DBG) matrix. From thisDBG, various paths through the matrix may be extracted, where each pathconstitutes a candidate haplotype, e.g., hypotheses, for what the trueDNA sequence may be on at least one strand. Further, haplotype alignmentmay take place, such as where each extracted haplotype candidate may bealigned, e.g., Smith-Waterman aligned, back to the reference genome, soas to determine what variation(s) from the reference it implies.Furthermore, a read likelihood calculation may be performed, such aswhere each read may be tested against each haplotype, or hypothesis, toestimate a probability of observing the read assuming the haplotype wasthe true original DNA sampled.

With respect to these processes, the read likelihood calculation willtypically be the most resource intensive and time consuming operation tobe performed, often requiring a pair HMM evaluation. Additionally, theconstructing of De Bruijn graphs for each pileup of reads, withassociated operations of identifying locally and globally unique K-mers,as described below may also be resource intensive and/or time consuming.Accordingly, in various embodiments, one or more of the variouscalculations involved in performing one or more of these steps may beconfigured so as to be implemented in optimized software fashion orhardware, such as for being performed in an accelerated manner by anintegrated circuit, as herein described.

As indicated above, in various embodiments, a Haplotype Caller of thedisclosure, implemented in software and/or in hardware or a combinationthereof may be configured to include one or more of the followingoperations: Active Region Identification, Localized Haplotype Assembly,Haplotype Alignment, Read Likelihood Calculation, and/or Genotyping. Forinstance, the devices, systems, and/or methods of the disclosure may beconfigured to perform one or more of a mapping, aligning, and/or asorting operation on data obtained from a subject's sequenced DNA/RNA togenerate mapped, aligned, and/or sorted results data. This results datamay then be cleaned up, such as by performing a de duplication operationon it and/or that data may be communicated to one or more dedicatedhaplotype caller processing engines for performing a variant calloperation, including one or more of the aforementioned steps, on thatresults data so as to generate a variant call file with respect thereto.Hence, all the reads that have been sequenced and/or been mapped and/oraligned to particular positions in the reference genome may be subjectedto further processing so as to determine how the determined sequencediffers from a reference sequence at any given point in the referencegenome.

Accordingly, in various embodiments, a device, system, and/or method ofits use, as herein disclosed, may include a variant or haplotype callersystem that is implemented in a software and/or hardwired configurationto perform an active region identification operation on the obtainedresults data. Active region identification involves identifying anddetermining places where multiple reads, e.g., in a pile up of reads,disagree with a reference, and further involves generating one or morewindows around the disagreements (“active regions”) such that the regionwithin the window may be selected for further processing. For example,during a mapping and/or aligning step, identified reads are mappedand/or aligned to the regions in the reference genome where they areexpected to have originated in the subject's genetic sequence.

However, as the sequencing is performed in such a manner so as to createan oversampling of sequenced reads for any given region of the genome,at any given position in the reference sequence may be seen a pile up ofany and/all of the sequenced reads that line up and align with thatregion. All of these reads that align and/or overlap in a given regionor pile up position may be input into the variant caller system. Hence,for any given read being analyzed, the read may be compared to thereference at its suspected region of overlap, and that read may becompared to the reference to determine if it shows any difference in itssequence from the known sequence of the reference. If the read lines upto the reference, without any insertions or deletions and all the basesare the same, then the alignment is determined to be good.

Hence, for any given mapped and/or aligned read, the read may have basesthat are different from the reference, e.g., the read may include one ormore SNPs, creating a position where a base is mismatched; and/or theread may have one or more of an insertion and/or deletion, e.g.,creating a gap in the alignment. Accordingly, in any of these instances,there will be one or more mismatches that need to be accounted for byfurther processing. Nevertheless, to save time and increase efficiency,such further processing should be limited to those instances where aperceived mismatch is non-trivial, e.g., a non-noise difference. Indetermining the significance of a mismatch, places where multiple readsin a pile up disagree from the reference may be identified as an activeregion, a window around the active region may then be used to select alocus of disagreement that may then be subjected to further processing.The disagreement, however, should be non-trivial. This may be determinedin many ways, for instance, the non-reference probability may becalculated for each locus in question, such as by analyzing base matchvs mismatch quality scores, such as above a given threshold deemed to bea sufficiently significant amount of indication from those reads thatdisagree with the reference in a significant way.

For instance, if 30 of the mapped and/or aligned reads all line upand/or overlap so as to form a pile up at a given position in thereference, e.g., an active region, and only 1 or 2 out of the 30 readsdisagrees with the reference, then the minimal threshold for furtherprocessing may be deemed to not have been met, and the non-agreeingread(s) can be disregarded in view of the 28 or 29 reads that do agree.However, if 3 or 4, or 5, or 10, or more of the reads in the pile updisagree, then the disagreement may be statistically significant enoughto warrant further processing, and an active region around theidentified region(s) of difference might be determined. In such aninstance, an active region window ascertaining the bases surroundingthat difference may be taken to give enhanced context to the regionsurrounding the difference, and additional processing steps, such asperforming a Gaussian distribution and sum of non-referenceprobabilities distributed across neighboring positions, may be taken tofurther investigate and process that region to figure out if and activeregion should be declared and if so what variances from the referenceactually are present within that region if any. Therefore, thedetermining of an active region identifies those regions where extraprocessing may be needed to clearly determine if a true variance or aread error has occurred.

Particularly, because in many instances it is not desirable to subjectevery region in a pile up of sequences to further processing, an activeregion can be identified whereby it is only those regions where extraprocessing may be needed to clearly determine if a true variance or aread error has occurred that may be determined as needing of furtherprocessing. And, as indicated above, it may be the size of the supposedvariance that determines the size of the window of the active region.For instance, in various instances, the bounds of the active window mayvary from 1 or 2 or about 10 or 20 or even about 25 or about 50 to about200 or about 300, or about 500 or about 1000 bases long or more, whereit is only within the bounds of the active window that furtherprocessing is taking place. Of course, the size of the active window canbe any suitable length so long as it provides the context to determinethe statistical importance of a difference.

Hence, if there is only one or two isolated differences, then the activewindow may only need to cover one or more to a few dozen bases in theactive region so as to have enough context to make a statistical callthat an actual variant is present. However, if there is a cluster or abunch of differences, or if there are indels present for which morecontext is desired, then the window may be configured so as to belarger. In either instance, it may be desirable to analyze any and allthe differences that might occur in clusters, so as to analyze them allin one or more active regions, because to do so can provide supportinginformation about each individual difference and will save processingtime by decreasing the number of active windows engaged. In variousinstances, the active region boundaries may be determined by activeprobabilities that pass a given threshold, such as about 0.00001 orabout 0.00001 or about 0.0001 or less to about 0.002 or about 0.02 orabout 0.2 or more. And if the active region is longer than a giventhreshold, e.g., about 300-500 bases or 1000 bases or more, then theregion can be broken up into sub-regions, such as by sub-regions definedby the locus with the lowest active probability score.

In various instances, after an active region is identified, a localizedhaplotype assembly procedure may be performed. For instance, in eachactive region, all the piled up and/or overlapping reads may beassembled into a “De Bruijn Graph” (DBG). A DBG may be a directed graphbased on all the reads that overlapped the selected active region, whichactive region may be about 200 or about 300 to about 400 or about 500bases long or more, within which active region the presence and/oridentity of variants are to be determined. In various instances, asindicated above, the active region can be extended, e.g., by includinganother about 100 or about 200 or more bases in each direction of thelocus in question so as to generate an extended active region, such aswhere additional context surrounding a difference may be desired.Accordingly, it is from the active region window, extended or not, thatall of the reads that have portions that overlap the active region arepiled up, e.g., to produce a pileup, the overlapping portions areidentified, and the read sequences are threaded into the haplotypecaller system and are thereby assembled together in the form of a DeBruin graph, much like the pieces of a puzzle.

Accordingly, for any given active window there will be reads that form apile up such that en masse the pile up will include a sequence pathwaythrough which the overlapping regions of the various overlapping readsin the pile up covers the entire sequence within the active window.Hence, at any given locus in the active region, there will be aplurality of reads overlapping that locus, albeit any given read may notextend the entire active region. The result of this is that variousregions of various reads within a pileup are employed by the DBG indetermining whether a variant actually is present or not for any givenlocus in the sequence within the active region. As it is within theactive window that this determination is being made, it is thoseportions of any given read within the borders of the active window thatare considered, and those portions that are outside of the active windowmay be discarded.

As indicated, it is those sections of the reads that overlap thereference within the active region that are fed into the DBG system. TheDBG system then assembles the reads like a puzzle into a graph, and thenfor each position in the sequence, it is determined based on thecollection of overlapping reads for that position, whether there is amatch or a mismatch for any given, and if there is a mismatch, what theprobability of that mismatch is. For instance, where there are discreteplaces where segments of the reads in the pile up overlap each other,they may be aligned to one another based on their areas of matching, andfrom stringing or stitching the matching reads together, as determinedby their points of matching, it can be established for each positionwithin that segment, whether and to what extent the reads at any givenposition match or mismatch each other. Hence, if two or more reads beingcompiled line up and match each other identically for a while, a graphhaving a single string will result; however, when the two or more readscome to a point of difference, a branch in the graph will form, and twoor more divergent strings will result, until matching between the two ormore reads resumes.

Hence, the pathways through the graph are often not a straight line. Forinstance, where the k-mers of a read varies from the k-mers of thereference and/or the k-mers from one or more overlapping reads, e.g., inthe pileup, a “bubble” will be formed in the graph at the point ofdifference resulting in two divergent strings that will continue alongtwo different path lines until matching between the two sequencesresumes. Each vertex may be given a weighted score identifying how manytimes the respective k-mers overlap in all of the reads in the pileup.Particularly, each pathway extending through the generated graph fromone side to the other may be given a count. And where the same k-mersare generated from a multiplicity of reads, e.g., where each k-mer hasthe same sequence pattern, they may be accounted for in the graph byincreasing the count for that pathway where the k-mer overlaps analready existing k-mer pathway. Hence, where the same k-mer is generatedfrom a multiplicity of overlapping reads having the same sequence, thepattern of the pathway between the graph will be repeated over and overagain and the count for traversing this pathway through the graph willbe increased incrementally in correspondence therewith. In such aninstance, the pattern is only recorded for the first instance of thek-mer, and the count is incrementally increased for each k-mer thatrepeats that pattern. In this mode the various reads in the pile up canbe harvested to determine what variations occur and where.

In a manner such as this, a graph matrix may be formed by taking allpossible N base k-mers, e.g., 10 base k-mers, which can be generatedfrom each given read by sequentially walking the length of the read inten base segments, where the beginning of each new ten base segment isoff set by one base from the last generated 10 base segment. Thisprocedure may then be repeated by doing the same for every read in thepile up within the active window. The generated k-mers may then bealigned with one another such that areas of identical matching betweenthe generated k-mers are matched to the areas where they overlap, so asto build up a data structure, e.g., graph, that may then be scanned andthe percentage of matching and mismatching may be determined.Particularly, the reference and any previously processed k-mers alignedtherewith may be scanned with respect to the next generated k-mer todetermine if the instant generated k-mer matches and/or overlaps anyportion of a previously generated k-mer, and where it is found to matchthe instant generated k-mer can then be inserted into the graph at theappropriate position.

Once built, the graph can be scanned and it may be determined based onthis matching whether any given SNPs and/or indels in the reads withrespect to the reference are likely to be an actual variation in thesubject's genetic code or the result of a processing or other error. Forinstance, if all or a significant portion of the k-mers, of all or asignificant portion of all of the reads, in a given region include thesame SNP and/or indel mismatch, but differ from the reference in thesame manner, then it may be determined that there is an actually SNPand/or indel variation in the subject's genome as compared to thereference genome. However, if only a limited number of k-mers from alimited number of reads evidence the artifact, it is likely to be causedby machine and/or processing and/or other error and not indicative of atrue variation at the position in question.

As indicated, where there is a suspected variance, a bubble will beformed within the graph. Specifically, where all of the k-mers withinall of a given region of reads all match the reference, they will lineup in such a manner as to form a linear graph. However, where there is adifference between the bases at a given locus, at that locus ofdifference that graph will branch. This branching may be at any positionwithin the k-mer, and consequently at that point of difference the 10base k-mer, including that difference, will diverge from the rest of thek-mers in the graph. In such an instance, a new node, forming adifferent pathway through the graph will be formed.

Hence, where everything may have been agreeing, e.g., the sequence inthe given new k-mer being graphed is matching the sequence to which italigns in the graph, up to the point of difference the pathway for thatk-mer will match the pathway for the graph generally and will be linear,but post the point of difference, a new pathway through the graph willemerge to accommodate the difference represented in the sequence of thenewly graphed k-mer. This divergence being represented by a new nodewithin the graph. In such an instance, any new k-mers to be added to thegraph that match the newly divergent pathway will increase the count atthat node. Hence, for every read that supports the arc, the count willbe increased incrementally.

In various of such instances, the k-mer and/or the read it representswill once again start matching, e.g., after the point of divergence,such that there is now a point of convergence where the k-mer beginsmatching the main pathway through the graph represented by the k-mers ofthe reference sequence. For instance, naturally after a while theread(s) that support the branched node should rejoin the graph overtime. Thus, over time, the k-mers for that read will rejoin the mainpathway again. More particularly, for an SNP at a given locus within aread, the k-mer starting at that SNP will diverge from the main graphand will stay separate for about 10 nodes, because there are 10 basesper k-mer that overlap that locus of mismatching between the read andthe reference. Hence, for an SNP, at the 11^(th) position, the k-merscovering that locus within the read will rejoin the main pathway asexact matching is resumed. Consequently, it will take ten shifts for thek-mers of a read having an SNP at a given locus to rejoin the main graphrepresented by the reference sequence.

As indicated above, there is typically one main path or line or backbonethat is the reference path, and where there is a divergence a bubble isformed at a node where there is a difference between a read and thebackbone graph. Thus there are some reads that diverge from the backboneand form a bubble, which divergence may be indicative of the presence ofa variant. As the graph is processed, bubbles within bubbles withinbubbles may be formed along the reference backbone, so that they arestacked up and a plurality of pathways through the graph may be created.In such an instance, there may be a main path represented by thereference backbone, one path of a first divergence, and a further pathof a second divergence within the first divergence, all within a givenwindow, each pathway through the graph may represent an actual variationor may be an artifact such as caused by sequencing error, and/or PCRerror, and/or a processing error, and the like.

Once such a graph has been produced, it must be determined whichpathways through the graph represent actual variations present withinthe sample genome and which are mere artifacts. Albeit, it is expectedthat reads containing handling or machine errors will not be supportedby the majority of reads in the sample pileup, however, this is notalways the case. For instance, errors in PCR processing may typically bethe result of a cloning mistake that occurs when preparing the DNAsample, such mistakes tend to result in an insertion and/or a deletionbeing added to the cloned sequence. Such indel errors may be moreconsistent among reads, and can wind up with generating multiple readsthat have the same error from this mistake in PCR cloning. Consequently,a higher count line for such a point of divergence may result because ofsuch errors.

Hence, once a graph matrix has been formed, with many paths through thegraph, the next stage is to traverse and thereby extract all of thepaths through the graph, e.g., left to right. One path will be thereference backbone, but there will be other paths that follow variousbubbles along the way. All paths must be traversed and their counttabulated. For instance, if the graph includes a pathway with a twolevel bubble in one spot and a three level bubble in another spot, therewill be (2×3)⁶ paths through that graph. So each of the paths willindividually need to be extracted, which extracted paths are termed ascandidate haplotypes. Such candidate haplotypes represent theories forwhat could really be representative of the subject's actual DNA that wassequenced, and the following processing steps, including one or more ofhaplotype alignment, read likelihood calculation, and/or genotyping maybe employed to test these theories so as to find out the probabilitiesthat anyone and/or each of these theories is correct. The implementationof a De Bruijn graph reconstruction therefore represents a way toreliably extract a good set of hypotheses to test.

For instance, in performing a variant call function, as disclosedherein, an active region identification operation may be implemented,such as for identifying places where multiple reads in a pile up withina given region disagree with the reference, and for generating a windowaround the identified active region, so that only these regions may beselected for further processing. Additionally, localized haplotypeassembly may take place, such as where, for each given active region,all the overlapping reads in the pile up may be assembled into a “DeBruijn graph” (DBG) matrix. From this DBG, various paths through thematrix may be extracted, where each path constitutes a candidatehaplotype, e.g., hypotheses, for what the true DNA sequence may be on atleast one strand.

Further, haplotype alignment may take place, such as where eachextracted haplotype candidate may be aligned, e.g., Smith-Watermanaligned, back to the reference genome, so as to determine whatvariation(s) from the reference it implies. Furthermore, a readlikelihood calculation may be performed, such as where each read may betested against each haplotype, to estimate a probability of observingthe read assuming the haplotype was the true original DNA sampled.Finally, a genotyping operation may be implement, and a variant callfile produced. As indicated above, any or all of these operations may beconfigured so as to be implemented in an optimized manner in softwareand/or in hardware, and in various instances, because of the resourceintensive and time consuming nature of building a DBG matrix andextracting candidate haplotypes therefrom, and/or because of theresource intensive and time consuming nature of performing a haplotypealignment and/or a read likelihood calculation, which may include theengagement of an Hidden Markov Model (HMM) evaluation, these operations(e.g., localized haplotype assembly, and/or haplotype alignment, and/orread likelihood calculation) or a portion thereof may be configured soas to have one or more functions of their operation implemented in ahardwired form, such as for being performed in an accelerated manner byan integrated circuit as described herein. In various instances, thesetasks may be configured to be implemented by one or more quantumcircuits such as in a quantum computing device.

Accordingly, in various instances, the devices, systems, and methods forperforming the same may be configured so as to perform a haplotypealignment and/or a read likelihood calculation. For instance, asindicated, each extracted haplotype may be aligned, such asSmith-Waterman aligned, back to the reference genome, so as to determinewhat variation(s) from the reference it implies. In various exemplaryinstances, scoring may take place, such as in accordance with thefollowing exemplary scoring parameters: a match=20.0; a mismatch=−15.0;a gap open −26.0; and a gap extend=−1.1, other scoring parameters may beused. Accordingly, in this manner, a CIGAR strand may be generated andassociated with the haplotype to produce an assembled haplotype, whichassembled haplotype may eventually be used to identify variants.Accordingly, in a manner such as this, the likelihood of a given readbeing associated with a given haplotype may be calculated for allread/haplotype combinations. In such instances, the likelihood may becalculated using a Hidden Markov Model (HMM).

For instance, the various assembled haplotypes may be aligned inaccordance with a dynamic programing model similar to a SW alignment. Insuch an instance, a virtual matrix may be generated such as where thecandidate haplotype, e.g., generated by the DBG, may be positioned onone axis of a virtual array, and the read may be positioned on the otheraxis. The matrix may then be filled out with the scores generated bytraversing the extracted paths through the graph and calculating theprobabilities that any given path is the true path. Hence, in such aninstance, a difference in this alignment protocol from a typical SWalignment protocol is that with respect to finding the most likely paththrough the array, a maximum likelihood calculation is used, such as acalculation performed by an HMM model that is configured to provide thetotal probability for alignment of the reads to the haplotype. Hence, anactual CIGAR strand alignment, in this instance, need not be produced.Rather all possible alignments are considered and their possibilitiesare summed. The pair HMM evaluation is resource and time intensive, andthus, implementing its operations within a hardwired configurationwithin an integrated circuit or via quantum circuits on a quantumcomputing platform is very advantageous.

For example, each read may be tested against each candidate haplotype,so as to estimate a probability of observing the read assuming thehaplotype is the true representative of the original DNA sampled. Invarious instances, this calculation may be performed by evaluating a“pair hidden Markov model” (HMM), which may be configured to model thevarious possible ways the haplotype candidate might have been modified,such as by PCR or sequencing errors, and the like, and a variationintroduced into the read observed. In such instances, the HMM evaluationmay employ a dynamic programming method to calculate the totalprobability of any series of Markov state transitions arriving at theobserved read in view of the possibility that any divergence in the readmay be the result of an error model. Accordingly, such HMM calculationsmay be configured to analyze all the possible SNPs and Indels that couldhave been introduced into one or more of the reads, such as byamplification and/or sequencing artifacts.

Particularly, paired HMM considers in a virtual matrix all the possiblealignments of the read to the reference candidate haplotypes along witha probability associated with each of them, where all probabilities areadded up. The sum of all of the probabilities of all the variants alonga given path is added up to get one overarching probability for eachread. This process is then performed for every pair, for everyhaplotype, read pair. For example, if there is a six pile up clusteroverlapping a given region, e.g., a region of six haplotype candidates,and if the pile up includes about one hundred reads, 600 HMM operationswill then need to be performed. More particularly, if there are 6haplotypes then there are going to be 6 branches through the path andthe probability that each one is the correct pathway that matches thesubject's actual genetic code for that region must be calculated.Consequently, each pathway for all of the reads must be considered, andthe probability for each read that you would arrive at this givenhaplotype is to be calculated.

The pair Hidden Markov Model is an approximate model for how a truehaplotype in the sampled DNA may transform into a possible differentdetected read. It has been observed that these types of transformationsare a combination of SNPs and indels that have been introduced into thegenetic sample set by the PCR process, by one or more of the othersample preparation steps, and/or by an error caused by the sequencingprocess, and the like. As can be seen with respect to FIG. 1, to accountfor these types of errors, an underlying 3-state base model may beemployed, such as where: (M=alignment match, I=insertion, D=deletion),further where any transition is possible except I<->D.

As can be seen with respect to FIG. 1, the 3-state base modeltransitions are not in a time sequence, but rather are in a sequence ofprogression through the candidate haplotype and read sequences,beginning at position 0 in each sequence, where the first base isposition 1. A transition to M implies position +1 in both sequences; atransition to I implies position +1 in the read sequence only; and atransition to D implies position +1 in the haplotype sequence only. Thesame 3-state model may be configured to underlie the Smith-Watermanand/or Needleman-Wunsch alignments, as herein described, as well.Accordingly, such a 3-state model, as set forth herein, may be employedin a SW and/or NW process thereby allowing for affine gap (indel)scoring, in which gap opening (entering the I or D state) is assumed tobe less likely than gap extension (remaining in the I or D state).Hence, in this instance, the pair HMM can be seen as alignment, and aCIGAR string may be produced to encode a sequence of the various statetransitions.

In various instances, the 3-state base model may be complicated byallowing the transition probabilities to vary by position. For instance,the probabilities of all M transitions may be multiplied by the priorprobabilities of observing the next read base given its base qualityscore, and the corresponding next haplotype base. In such an instance,the base quality scores may translate to a probability of a sequencingSNP error. When the two bases match, the prior probability is taken asone minus this error probability, and when they mismatch, it is taken asthe error probability divided by 3, since there are 3 possible SNPresults.

The above discussion is regarding an abstract “Markovish” model. Invarious instances, the maximum-likelihood transition sequence may alsobe determined, which is termed herein as an alignment, and may beperformed using a Needleman-Wunsch or other dynamic programmingalgorithm. But, in various instances, in performing a variant callingfunction, as disclosed herein, the maximum likelihood alignment, or anyparticular alignment, need not be a primary concern. Rather, the totalprobability may be computed, for instance, by computing the totalprobability of observing the read given the haplotype, which is the sumof the probabilities of all possible transition paths through the graph,from read position zero at any haplotype position, to the read endposition, at any haplotype position, each component path probabilitybeing simply the product of the various constituent transitionprobabilities.

Finding the sum of pathway probabilities may also be performed byemploying a virtual array and using a dynamic programming algorithm, asdescribed above, such that in each cell of a (0 . . . N)×(0 . . . M)matrix, there are three probability values calculated, corresponding toM, D, and I transition states. (Or equivalently, there are 3 matrices.)The top row (read position zero) of the matrix may be initialized toprobability 1.0 in the D states, and 0.0 in the I and M states; and therest of the left column (haplotype position zero) may be initialized toall zeros. (In software, the initial D probabilities may be set near thedouble-precision max value, e.g. 2̂1020, so as to avoid underflow, butthis factor may be normalized out later.)

This 3-to-1 computation dependency restricts the order that cells may becomputed. They can be computed left to right in each row, progressingthrough rows from top to bottom, or top to bottom in each column,progressing rightward. Additionally, they may be computed inanti-diagonal wavefronts, where the next step is to compute all cells(n,m) where n+m equals the incremented step number. This wavefront orderhas the advantage that all cells in the anti-diagonal may be computedindependently of each other. The bottom row of the matrix then, at thefinal read position, may be configured to represent the completedalignments. In such an instance, the Haplotype Caller will work bysumming the I and M probabilities of all bottom row cells. In variousembodiments, the system may be set up so that no D transitions arepermitted within the bottom row, or a D transition probability of 0.0may be used there, so as to avoid double counting.

As described herein, in various instances, each HMM evaluation mayoperate on a sequence pair, such as on a candidate haplotype and a readpair. For instance, within a given active region, each of a set ofhaplotypes may be HMM-evaluated vs. each of a set of reads. In such aninstance, the software and/or hardware input bandwidth may be reducedand/or minimized by transferring the set of reads and the set ofhaplotypes once, and letting the software and/or hardware generate theN×M pair operations. In certain instances, a Smith-Waterman evaluatormay be configured to queue up individual HMM operations, each with itsown copy of read and haplotype data. A Smith-Waterman (SW) alignmentmodule may be configured to run the pair HMM calculation in linear spaceor may operate in log probability space. This is useful to keepprecision across the huge range of probability values with fixed-pointvalues. However, in other instances, floating point operations may beused.

There are three parallel multiplications (e.g., additions in log space),then two serial additions (˜5-6 stage approximation pipelines), then anadditional multiplication. In such an instance, the full pipeline may beabout L=12-16 cycles long. The I & D calculations may be about half thelength. The pipeline may be fed a multiplicity of input probabilities,such as 2 or 3 or 5 or 7 or more input probabilities each cycle, such asfrom one or more already computed neighboring cells (M and/or D from theleft, M and/or I from above, and/or M and/or I and/or D fromabove-left). It may also include one or more haplotype bases, and/or oneor more read bases such as with associated parameters, e.g.,pre-processed parameters, each cycle. It outputs the M & I & D resultset for one cell each cycle, after fall-through latency.

As indicated above, in performing a variant call function, as disclosedherein, a De Bruijn Graph may be formulated, and when all of the readsin a pile up are identical, the DBG will be linear. However, where thereare differences, the graph will form “bubbles” that are indicative ofregions of differences resulting in multiple paths diverging frommatching the reference alignment and then later re-joining in matchingalignment. From this DBG, various paths may be extracted, which formcandidate haplotypes, e.g., hypotheses for what the true DNA sequencemay be on at least one strand, which hypotheses may be tested byperforming an HMM, or modified HMM, operation on the data. Furtherstill, a genotyping function may be employed such as where the possiblediploid combinations of the candidate haplotypes may be formed, and foreach of them, a conditional probability of observing the entire readpileup may be calculated. These results may then be fed into a Bayesianformula module to calculate an absolute probability that each genotypeis the truth, given the entire read pileup observed.

Hence, in accordance with the devices, systems, and methods of their usedescribed herein, in various instances, a genotyping operation may beperformed, which genotyping operation may be configured so as to beimplemented in an optimized manner in software and/or in hardware and/orby a quantum processing unit. For instance, the possible diploidcombinations of the candidate haplotypes may be formed, and for eachcombination, a conditional probability of observing the entire readpileup may be calculated, such as by using the constituent probabilitiesof observing each read given each haplotype from the pair HMMevaluation. The results of these calculations feed into a Bayesianformula so as to calculate an absolute probability that each genotype isthe truth, given the entire read pileup observed.

Accordingly, in various aspects, the present disclosure is directed to asystem for performing a haplotype or variant call operation on generatedand/or supplied data so as to produce a variant call file with respectthereto. Specifically, as described herein above, in particularinstances, a variant call file may be a digital or other such file thatencodes the difference between one sequence and another, such as a thedifference between a sample sequence and a reference sequence.Specifically, in various instances, the variant call file may be a textfile that sets forth or otherwise details the genetic and/or structuralvariations in a person's genetic makeup as compared to one or morereference genomes.

For instance, a haplotype is a set of genetic, e.g., DNA and/or RNA,variations, such as polymorphisms that reside in a person's chromosomesand as such may be passed on to offspring and thereby inheritedtogether. Particularly, a haplotype can refer to a combination ofalleles, e.g., one of a plurality of alternative forms of a gene such asmay arise by mutation, which allelic variations are typically found atthe same place on a chromosome. Hence, in determining the identity of aperson's genome it is important to know which form of various differentpossible alleles a specific person's genetic sequence codes for. Inparticular instances, a haplotype may refer to one or more, e.g., a set,of nucleotide polymorphisms (e.g., SNPs) that may be found at the sameposition on the same chromosome.

Typically, in various embodiments, in order to determine the genotype,e.g., allelic haplotypes, for a subject, as described herein and above,a software based algorithm may be engaged, such as an algorithmemploying a haplotype call program, e.g., GATK, for simultaneouslydetermining SNPs and/or insertions and/or deletions, i.e., indels, in anindividual's genetic sequence. In particular, the algorithm may involveone or more haplotype assembly protocols such as for local de-novoassembly of a haplotype in one or more active regions of the geneticsequence being processed. Such processing typically involves thedeployment of a processing function called a Hidden Markov Model (HMM)that is a stochastic and/or statistical model used to exemplify randomlychanging systems such as where it is assumed that future states withinthe system depend only on the present state and not on the sequence ofevents that precedes it.

In such instances, the system being modeled bears the characteristics oris otherwise assumed to be a Markov process with unobserved (hidden)states. In particular instances, the model may involve a simple dynamicBayesian network. Particularly, with respect to determining geneticvariation, in its simplest form, there is one of four possibilities forthe identity of any given base in a sequence being processed, such aswhen comparing a segment of a reference sequence, e.g., a hypotheticalhaplotype, and that of a subject's DNA or RNA, e.g., a read derived froma sequencer. However, in order to determine such variation, in a firstinstance, a subject's DNA/RNA must be sequenced, e.g., via a Next GenSequencer (“NGS”), to produce a readout or “reads” that identify thesubject's genetic code. Next, once the subject's genome has beensequenced to produce one or more reads, the various reads,representative of the subject's DNA and/or RNA need to be mapped and/oraligned, as herein described above in great detail. The next step in theprocess then is to determine how the genes of the subject that have justbeen determined, e.g., having been mapped and/or aligned, vary from thatof a prototypical reference sequence. In performing such analysis,therefore, it is assumed that the read potentially representing a givengene of a subject is a representation of the prototypical haplotypealbeit with various SNPs and/or indels that are to presently bedetermined.

Specifically, in particular aspects, devices, systems, and/or methodsfor practicing the same, such as for performing a haplotype and/orvariant call function, such as deploying an HMM function, for instance,in an accelerated haplotype caller is provided. In various instances, inorder to overcome these and other such various problems known in theart, the HMM accelerator herein presented may be configured to beoperated in a manner so as to be implemented in software, implemented inhardware, or a combination of being implemented and/or otherwisecontrolled in part by software and/or in part by hardware and/or mayinclude quantum computing implementations. For instance, in a particularaspect, the disclosure is directed to a method by which data pertainingto the DNA and/or RNA sequence identity of a subject and/or how thesubject's genetic information may differ from that of a reference genomemay be determined.

In such an instance, the method may be performed by the implementationof a haplotype or variant call function, such as employing an HMMprotocol. Particularly, the HMM function may be performed in hardware,software, or via one or more quantum circuits, such as on an accelerateddevice, in accordance with a method described herein. In such aninstance, the HMM accelerator may be configured to receive and processthe sequenced, mapped, and/or aligned data, to process the same, e.g.,to produce a variant call file, as well as to transmit the processeddata back throughout the system. Accordingly, the method may includedeploying a system where data may be sent from a processor, such as asoftware-controlled CPU or GPU or even a QPU, to a haplotype callerimplementing an accelerated HMM, which haplotype caller may be deployedon a microprocessor chip, such as an FPGA, ASIC, or structured ASIC orimplemented by one or more quantum circuits. The method may furtherinclude the steps for processing the data to produce HMM result data,which results may then be fed back to the CPU and/or GPU and/or QPU.

Particularly, in one embodiment, as can be seen with respect to FIG. 2,a bioinformatics pipeline system including an HMM accelerator isprovided. For instance, in one instance, the bioinformatics pipelinesystem may be configured as a variant call system 1. The system isillustrated as being implemented in hardware, but may also beimplemented via one or more quantum circuits, such as of a quantumcomputing platform. Specifically, FIG. 2 provides a high level view ofan HMM interface structure. In particular embodiments, the variant callsystem 1 is configured to accelerate at least a portion of a variantcall operation, such as an HMM operation. Hence, in various instances,the variant call system may be referenced herein as an HMM system 1. Thesystem 1 includes a server having one or more central processing units(CPU/GPU/QPU) 1000 configured for performing one or more routinesrelated to the sequencing and/or processing of genetic information, suchas for comparing a sequenced genetic sequence to one or more referencesequences.

Additionally, the system 1 includes a peripheral device 2, such as anexpansion card, that includes a microchip 7, such as an FPGA, ASIC, orsASIC. In some instances, one or more quantum circuits may be providedand configured for performing the various operations set forth herein.It is also to be noted that the term ASIC may refer equally to astructured ASIC (sASIC), where appropriate. The peripheral device 2includes an interconnect 3 and a bus interface 4, such as a parallel orserial bus, which connects the CPU/GPU/QPU 1000 with the chip 7. Forinstance, the device 2 may comprise a peripheral component interconnect,such as a PCI, PCI-X, PCIe, or QPI (quick path interconnect), and mayinclude a bus interface 4, that is adapted to operably and/orcommunicably connect the CPU/GPU/QPU 1000 to the peripheral device 2,such as for low latency, high data transfer rates. Accordingly, inparticular instances, the interface may be a peripheral componentinterconnect express (PCIe) 4 that is associated with the microchip 7,which microchip includes an HMM accelerator 8. For example, inparticular instances, the HMM accelerator 8 is configured for performingan accelerated HMM function, such as where the HMM function, in certainembodiments, may at least partially be implemented in the hardware ofthe FPGA, AISC, or sASIC or via one or more suitably configured quantumcircuits.

Specifically, FIG. 2 presents a high-level figure of an HMM accelerator8 having an exemplary organization of one or more engines 13, such as aplurality of processing engines 13 a-13 _(m+1), for performing one ormore processes of a variant call function, such as including an HMMtask. Accordingly, the HMM accelerator 8 may be composed of a datadistributor 9, e.g., CentCom, and one or a multiplicity of processingclusters 11-11 _(n+1) that may be organized as or otherwise include oneor more instances 13, such as where each instance may be configured as aprocessing engine, such as a small engine 13 a-13 _(m+1). For instance,the distributor 9 may be configured for receiving data, such as from theCPU/GPU/QPU 1000, and distributing or otherwise transferring that datato one or more of the multiplicity of HMM processing clusters 11.

Particularly, in certain embodiments, the distributor 9 may bepositioned logically between the on-board PCIe interface 4 and the HMMaccelerator module 8, such as where the interface 4 communicates withthe distributor 9 such as over an interconnect or other suitablyconfigured bus 5, e.g., PCIe bus. The distributor module 9 may beadapted for communicating with one or more HMM accelerator clusters 11such as over one or more cluster buses 10. For instance, the HMMaccelerator module 8 may be configured as or otherwise include an arrayof clusters 11 a-11 _(n+1) such as where each HMM cluster 11 may beconfigured as or otherwise includes a cluster hub 11 and/or may includeone or more instances 13, which instance may be configured as aprocessing engine 13 that is adapted for performing one or moreoperations on data received thereby. Accordingly, in variousembodiments, each cluster 11 may be formed as or otherwise include acluster hub 11 a-11 _(n+1), where each of the hubs may be operablyassociated with multiple HMM accelerator engine instances 13 a-13_(m+1), such as where each cluster hub 11 may be configured fordirecting data to a plurality of the processing engines 13 a-13 _(m+1)within the cluster 11.

In various instances, the HMM accelerator 8 is configured for comparingeach base of a subject's sequenced genetic code, such as in read format,with the various known or generated candidate haplotypes of a referencesequence and determining the probability that any given base at aposition being considered either matches or doesn't match the relevanthaplotype, e.g., the read includes an SNP, an insertion, or a deletion,thereby resulting in a variation of the base at the position beingconsidered. Particularly, in various embodiments, the HMM accelerator 8is configured to assign transition probabilities for the sequence of thebases of the read going between each of these states, Match (“M”),Insert (“I”), or Delete (“D”) as described in greater detail hereinbelow.

More particularly, dependent on the configuration, the HMM accelerationfunction may be implemented in either software, such as by theCPU/GPU/QPU 1000 and/or microchip 7, and/or may be implemented inhardware and may be present within the microchip 7, such as positionedon the peripheral expansion card or board 2. In various embodiments,this functionality may be implemented partially as software, e.g., runby the CPU/GPU/QPU 1000, and partially as hardware, implemented on thechip 7 or via one or more quantum processing circuits. Accordingly, invarious embodiments, the chip 7 may be present on the motherboard of theCPU/GPU/QPU 1000, or it may be part of the peripheral device 2, or both.Consequently, the HMM accelerator module 8 may include or otherwise beassociated with various interfaces, e.g., 3, 5, 10, and/or 12 so as toallow the efficient transfer of data to and from the processing engines13.

Accordingly, as can be seen with respect to FIGS. 2 and 3, in variousembodiments, a microchip 7 configured for performing a variant, e.g.,haplotype, call function is provided. The microchip 7 may be associatedwith a CPU/GPU/QPU 1000 such as directly coupled therewith, e.g.,included on the motherboard of a computer, or indirectly coupledthereto, such as being included as part of a peripheral device 2 that isoperably coupled to the CPU/GPU/QPU 1000, such as via one or moreinterconnects, e.g., 3, 4, 5, 10, and/or 12. In this instance, themicrochip 7 is present on the peripheral device 2. It is to beunderstood that although configured as a microchip, the acceleratorcould also be configured as one or more quantum circuits of a quantumprocessing unit, wherein the quantum circuits are configured as one ormore processing engines for performing one or more of the functionsdisclosed herein.

Hence, the peripheral device 2 may include a parallel or serialexpansion bus 4 such as for connecting the peripheral device 2 to thecentral processing unit (CPU/GPU/QPU) 1000 of a computer and/or server,such as via an interface 3, e.g., DMA. In particular instances, theperipheral device 2 and/or serial expansion bus 4 may be a PeripheralComponent Interconnect express (PCIe) that is configured to communicatewith or otherwise include the microchip 7, such as via connection 5. Asdescribed herein, the microchip 7 may at least partially be configuredas or may otherwise include an HMM accelerator 8. The HMM accelerator 8may be configured as part of the microchip 7, e.g., as hardwired and/oras code to be run in association therewith, and is configured forperforming a variant call function, such as for performing one or moreoperations of a Hidden Markov Model, on data supplied to the microchip 7by the CPU/GPU/QPU 1000, such as over the PCIe interface 4. Likewise,once one or more variant call functions have been performed, e.g., oneor more HMM operations run, the results thereof may be transferred fromthe HMM accelerator 8 of the chip 7 over the bus 4 to the CPU/GPU/QPU1000, such as via connection 3.

For instance, in particular instances, a CPU/GPU/QPU 1000 for processingand/or transferring information and/or executing instructions isprovided along with a microchip 7 that is at least partially configuredas an HMM accelerator 8. The CPU/GPU/QPU 1000 communicates with themicrochip 7 over an interface 5 that is adapted to facilitate thecommunication between the CPU/GPU/QPU 1000 and the HMM accelerator 8 ofthe microchip 7 and therefore may communicably connect the CPU/GPU/QPU1000 to the HMM accelerator 8 that is part of the microchip 7. Tofacilitate these functions, the microchip 7 includes a distributormodule 9, which may be a CentCom, that is configured for transferringdata to a multiplicity of HMM engines 13, e.g., via one or more clusters11, where each engine 13 is configured for receiving and processing thedata, such as by running an HMM protocol thereon, computing finalvalues, outputting the results thereof, and repeating the same. Invarious instances, the performance of an HMM protocol may includedetermining one or more transition probabilities, as described hereinbelow. Particularly, each HMM engine 13 may be configured for performinga job such as including one or more of the generating and/or evaluatingof an HMM virtual matrix to produce and output a final sum value withrespect thereto, which final sum expresses the probable likelihood thatthe called base matches or is different from a corresponding base in ahypothetical haplotype sequence, as described herein below.

FIG. 3 presents a detailed depiction of the HMM cluster 11 of FIG. 2. Invarious embodiments, each HMM cluster 11 includes one or more HMMinstances 13. One or a number of clusters may be provided, such asdesired in accordance with the amount of resources provided, such as onthe chip or quantum computing processor. Particularly, a HMM cluster maybe provided, where the cluster is configured as a cluster hub 11. Thecluster hub 11 takes the data pertaining to one or more jobs 20 from thedistributor 9, and is further communicably connected to one or more,e.g., a plurality of, HMM instances 13, such as via one or more HMMinstance busses 12, to which the cluster hub 11 transmits the job data20.

The bandwidth for the transfer of data throughout the system may berelatively low bandwidth process, and once a job 20 is received, thesystem 1 may be configured for completing the job, such as withouthaving to go off chip 7 for memory. In various embodiments, one job 20 ais sent to one processing engine 13 a at any given time, but severaljobs 20 _(a-n) may be distributed by the cluster hub 11 to severaldifferent processing engines 13 a-13 _(m+1), such as where each of theprocessing engines 13 will be working on a single job 20, e.g., a singlecomparison between one or more reads and one or more haplotypesequences, in parallel and at high speeds. As described below, theperformance of such a job 20 may typically involve the generation of avirtual matrix whereby the subject's “read” sequences may be compared toone or more, e.g., two, hypothetical haplotype sequences, so as todetermine the differences there between. In such instances, a single job20 may involve the processing of one or more matrices having amultiplicity of cells therein that need to be processed for eachcomparison being made, such as on a base by base basis. As the humangenome is about 3 billion base pairs, there may be on the order of 1 to2 billion different jobs to be performed when analyzing a 30λoversampling of a human genome (which is equitable to about 20 trillioncells in the matrices of all associated HMM jobs).

Accordingly, as described herein, each HMM instance 13 may be adapted soas to perform an HMM protocol, e.g., the generating and processing of anHMM matrix, on sequence data, such as data received thereby from theCPU/GPU/QPU 1000. For example, as explained above, in sequencing asubject's genetic material, such as DNA or RNA, the DNA/RNA is brokendown into segments, such as up to about 100 bases in length. Theidentity of these 100 base segments are then determined, such as by anautomated sequencer, and “read” into a FASTQ text based file or otherformat that stores both each base identity of the read along with aPhred quality score (e.g., typically a number between 0 and 63 in logscale, where a score of 0 indicates the least amount of confidence thatthe called base is correct, with scores between 20 to 45 generally beingacceptable as relatively accurate).

Particularly, as indicated above, a Phred quality score is a qualityindicator that measures the quality of the identification of thenucleobase identities generated by the sequencing processor, e.g., bythe automated DNA/RNA sequencer. Hence, each read base includes its ownquality, e.g., Phred, score based on what the sequencer evaluated thequality of that specific identification to be. The Phred represents theconfidence with which the sequencer estimates that it got the calledbase identity correct. This Phred score is then used by the implementedHMM module 8, as described in detail below, to further determine theaccuracy of each called base in the read as compared to the haplotype towhich it has been mapped and/or aligned, such as by determining itsMatch, Insertion, and/or Deletion transition probabilities, e.g., in andout of the Match state. It is to be noted that in various embodiments,the system 1 may modify or otherwise adjust the initial Phred scoreprior to the performance of an HMM protocol thereon, such as by takinginto account neighboring bases/scores and/or fragments of neighboringDNA and allowing such factors to influence the Phred score of the base,e.g., cell, under examination.

In such instances, as can be seen with respect to FIG. 4, the system 1,e.g., computer/quantum software, may determine and identify variousactive regions 500 n within the sequenced genome that may be exploredand/or otherwise subjected to further processing as herein described,which may be broken down into jobs 20 n that may be parallelized amongstthe various cores and available threads 1007 throughout the system 1.For instance, such active regions 500 may be identified as being sourcesof variation between the sequenced and reference genomes. Particularly,the CPU/GPU/QPU 1000 may have multiple threads 1007 running, identifyingactive regions 500 a, 500 b, and 500 c, compiling and aggregatingvarious different jobs 20 _(n) to be worked on, e.g., via a suitablyconfigured aggregator 1008, based on the active region(s) 500 a-ccurrently being examined. Any suitable number of threads 1007 may beemployed so as to allow the system 1 to run at maximum efficiency, e.g.,the more threads present the less active time spent waiting.

Once identified, compiled, and/or aggregated, the threads 1007/1008 willthen transfer the active jobs 20 to the data distributor 9, e.g.,CentCom, of the HMM module 8, such as via PCIe interface 4, e.g., in afire and forget manner, and will then move on to a different processwhile waiting for the HMM 8 to send the output data back so as to bematched back up to the corresponding active region 500 to which it mapsand/or aligns. The data distributor 9 will then distribute the jobs 20to the various different HMM clusters 11, such as on a job-by-jobmanner. If everything is running efficiently, this may be on a first infirst out format, but such does not need to be the case. For instance,in various embodiments, raw jobs data and processed job results data maybe sent through and across the system as they become available.

Particularly, as can be seen with respect to FIGS. 2, 3, and 4, thevarious job data 20 may be aggregated into 4K byte pages of data, whichmay be sent via the PCIe 4 to and through the CentCom 9 and on to theprocessing engines 13, e.g., via the clusters 11. The amount of databeing sent may be more or less than 4K bytes, but will typically includeabout 100 HMM jobs per 4K (e.g., 1024) page of data. Particularly, thesedata then get digested by the data distributor 9 and are fed to eachcluster 11, such as where one 4K page is sent to one cluster 11.However, such need not be the case as any given job 20 may be sent toany given cluster 11, based on the clusters that become available andwhen.

Accordingly, the cluster 11 approach as presented here efficientlydistributes incoming data to the processing engines 13 at high-speed.Specifically, as data arrives at the PCIe interface 4 from theCPU/GPU/QPU 1000, e.g., over DMA connection 3, the received data maythen be sent over the PCIe bus 5 to the CentCom distributor 9 of thevariant caller microchip 7. The distributor 9 then sends the data to oneor more HMM processing clusters 11, such as over one or more clusterdedicated buses 10, which cluster 11 may then transmit the data to oneor more processing instances 13, e.g., via one or more instance buses12, such as for processing. In this instance, the PCIe interface 4 isadapted to provide data through the peripheral expansion bus 5,distributor 9, and/or cluster 10 and/or instance 12 busses at a rapidrate, such as at a rate that can keep one or more, e.g., all, of the HMMaccelerator instances 13 _(a−(m+1)) within one or more, e.g., all, ofthe HMM clusters 11 _(a−(n+1)) busy, such as over a prolonged period oftime, e.g., full time, during the period over which the system 1 isbeing run, the jobs 20 are being processed, and whilst also keeping upwith the output of the processed HMM data that is to be sent back to oneor more CPUs 1000, over the PCIe interface 4.

For instance, any inefficiency in the interfaces 3, 5, 10, and/or 12that leads to idle time for one or more of the HMM accelerator instances13 may directly add to the overall processing time of the system 1.Particularly, when analyzing a human genome, there may be on the orderof two or more billion different jobs 20 that need to be distributed tothe various HMM clusters 11 and processed over the course of a timeperiod, such as under 1 hour, under 45 minutes, under 30 minutes, under20 minutes including 15 minutes, 10 minutes, 5 minutes, or less.

Accordingly, FIG. 4 sets forth an overview of an exemplary data flowthroughout the software and/or hardware of the system 1, as describedgenerally above. As can be seen with respect to FIG. 4, the system 1 maybe configured in part to transfer data, such as between the PCIeinterface 4 and the distributor 9, e.g., CentCom, such as over the PCIebus 5. Additionally, the system 1 may further be configured in part totransfer the received data, such as between the distributor 9 and theone or more HMM clusters 11, such as over the one or more cluster buses10. Hence, in various embodiments, the HMM accelerator 8 may include oneor more clusters 11, such as one or more clusters 11 configured forperforming one or more processes of an HMM function. In such aninstance, there is an interface, such as a cluster bus 10, that connectsthe CentCom 9 to the HMM cluster 11.

For instance, FIG. 5 is a high level diagram depicting the interface into and out of the HMM module 8, such as into and out of a clustermodule. As can be seen with respect to FIG. 5, each HMM cluster 11 maybe configured to communicate with, e.g., receive data from and/or sendfinal result data, e.g., sum data, to the CentCom data distributor 9through a dedicated cluster bus 10. Particularly, any suitable interfaceor bus 5 may be provided so long as it allows the PCIe interface 4 tocommunicate with the data distributor 9. More particularly, the bus 5may be an interconnect that includes the interpretation logic useful intalking to the data distributor 9, which interpretation logic may beconfigured to accommodate any protocol employed to provide thisfunctionality. Specifically, in various instances, the interconnect maybe configured as a PCIe bus 5.

Additionally, the cluster 11 may be configured such that single ormultiple clock domains may be employed therein, and hence, one or moreclocks may be present within the cluster 11. In particular instances,multiple clock domains may be provided. For example, a slower clock maybe provided, such as for communications, e.g., to and from the cluster11. Additionally, a faster, e.g., a high speed, clock may be providedwhich may be employed by the HMM instances 13 for use in performing thevarious state calculations described herein.

Particularly, in various embodiments, as can be seen with respect toFIG. 5, the system 1 may be set up such that, in a first instance, asthe data distributor 9 leverages the existing CentCom IP, a collar, suchas a gasket, may be provided, where the gasket is configured fortranslating signals to and from the CentCom interface 5 from and to theHMM cluster interface or bus 10. For instance, an HMM cluster bus 10 maycommunicably and/or operably connect the CPU/GPU 1000 to the variousclusters 11 of the HMM accelerator module 8. Hence, as can be seen withrespect to FIG. 5, structured write and/or read data for each haplotypeand/or for each read may be sent throughout the system 1.

Following a job 20 being input into the HMM engine, an HMM engine 13 maytypically start either: a) immediately, if it is IDLE, or b) after ithas completed its currently assigned task. It is to be noted that eachHMM accelerator engine 13 can handle ping and pong inputs (e.g., can beworking on one data set while the other is being loaded), thusminimizing downtime between jobs. Additionally, the HMM cluster collar11 may be configured to automatically take the input job 20 sent by thedata distributor 9 and assign it to one of the HMM engine instances 13in the cluster 11 that can receive a new job. There need not be acontrol on the software side that can select a specific HMM engineinstance 13 for a specific job 20. However, in various instances, thesoftware can be configured to control such instances.

Accordingly, in view of the above, the system 1 may be streamlined whentransferring the results data back to the CPU/GPU/QPU, and because ofthis efficiency there is not much data that needs to go back to theCPU/GPU/QPU to achieve the usefulness of the results. This allows thesystem to achieve about a 30 minute or less, such as about a 25 or abouta 20 minute or less, for instance, about a 18 or about a 15 minute orless, including about a 10 or about a 7 minute or less, even about a 5or about a 3 minute or less variant call operation, dependent on thesystem configuration.

FIG. 6 presents a high-level view of various functional blocks within anexemplary HMM engine 13 within a hardware accelerator 8, on the FPGA orASIC 7. Specifically, within the hardware HMM accelerator 8 there aremultiple clusters 11, and within each cluster 11 there are multipleengines 13. FIG. 6 presents a single instance of an HMM engine 13. Ascan be seen with respect to FIG. 6, the engine 13 may include aninstance bus interface 12, a plurality of memories, e.g., an HMEM 16 andan RMEM 18, various other components 17, HMM control logic 15, as wellas a result output interface 19. Particularly, on the engine side, theHMM instance bus 12 is operably connected to the memories, HMEM 16 andRMEM 18, and may include interface logic that communicates with thecluster hub 11, which hub is in communications with the distributor 9,which in turn is communicating with the PCIe interface 4 thatcommunicates with the variant call software being run by the CPU/GPUand/or server 1000. The HMM instance bus 12, therefore, receives thedata from the CPU 1000 and loads it into one or more of the memories,e.g., the HMEM and RMEM. This configuration may also be implemented inone or more quantum circuits and adapted accordingly.

In these instances, enough memory space should be allocated such that atleast one or two or more haplotypes, e.g., two haplotypes, may beloaded, e.g., in the HMEM 16, per given read sequence that is loaded,e.g., into the RMEM 18, which when multiple haplotypes are loadedresults in an easing of the burden on the PCIe bus 5 bandwidth. Inparticular instances, two haplotypes and two read sequences may beloaded into their respective memories, which would allow the foursequences to be processed together in all relevant combinations. Inother instances four, or eight, or sixteen sequences, e.g., pairs ofsequences, may be loaded, and in like manner be processed incombination, such as to further ease the bandwidth when desired.

Additionally, enough memory may be reserved such that a ping-pongstructure may be implemented therein such that once the memories areloaded with a new job 20 a, such as on the ping side of the memory, anew job signal is indicated, and the control logic 15 may beginprocessing the new job 20 a, such as by generating the matrix andperforming the requisite calculations, as described herein and below.Accordingly, this leaves the pong side of the memory available so as tobe loaded up with another job 20 b, which may be loaded therein whilethe first job 20 a is being processed, such that as the first job 20 ais finished, the second job 20 b may immediately begin to be processedby the control logic 15.

In such an instance, the matrix for job 20 b may be preprocessed so thatthere is virtually no down time, e.g., one or two clock cycles, from theending of processing of the first job 20 a, and the beginning ofprocessing of the second job 20 b. Hence, when utilizing both the pingand pong side of the memory structures, the HMEM 16 may typically store4 haplotype sequences, e.g., two a piece, and the RMEM 18 may typicallystore 2 read sequences. This ping-pong configuration is useful becauseit simply requires a little extra memory space, but allows for adoubling of the throughput of the engine 13.

During and/or after processing the memories 16, 18 feed into thetransition probabilities calculator and lookup table (LUT) block 17 a,which is configured for calculating various information related to“Priors” data, as explained below, which in turn feeds the Prior resultsdata into the M, I, and D state calculator block 17 b, for use whencalculating transition probabilities. One or more scratch RAMs 17 c mayalso be included, such as for holding the M, I, and D states at theboundary of the swath, e.g., the values of the bottom row of theprocessing swath, which as indicated, in various instances, may be anysuitable amount of cells, e.g., about 10 cells, in length so as to becommensurate with the length of the swath 35.

Additionally, a separate results output interface block 19 may beincluded so that when the sums are finished they, e.g., a 4 32-bit word,can immediately be transmitted back to the variant call software of theCPU/GPU/QPU 1000. It is to be noted that this configuration may beadapted so that the system 1, specifically the M, I, and D calculator 17b is not held up waiting for the output interface 19 to clear, e.g., solong as it does not take as long to clear the results as it does toperform the job 20. Hence, in this configuration, there may be threepipeline steps functioning in concert to make an overall systemspipeline, such as loading the memory, performing the MID calculations,and outputting the results. Further, it is noted that any given HMMengine 13 is one of many with their own output interface 19, howeverthey may share a common interface 10 back to the data distributor 9.Hence, the cluster hub 11 will include management capabilities to managethe transfer (“xfer”) of information through the HMM accelerator 8 so asto avoid collisions.

Accordingly, the following details the processes being performed withineach module of the HMM engines 13 as it receives the haplotype and readsequence data, processes it, and outputs results data pertaining to thesame, as generally outlined above. Specifically, the high-bandwidthcomputations in the HMM engine 13, within the HMM cluster 11, aredirected to computing and/or updating the match (M), insert (I), anddelete (D) state values, which are employed in determining whether theparticular read being examined matches the haplotype reference as wellas the extent of the same, as described above. Particularly, the readalong with the Phred score anf GOP value for each base in the read istransmitted to the cluster 11 from the distributor 9 and is therebyassigned to a particular processing engine 13 for processing. These dataare then used by the M, I, and D calculator 17 of the processing engine13 to determine whether the called base in the read is more or lesslikely to be correct and/or to be a match to its respective base in thehaplotype, or to be the product of a variation, e.g., an insert ordeletion; and/or if there is a variation, whether such variation is thelikely result of a true variability in the haplotype or rather anartifact of an error in the sequence generating and/or mapping and/oraligning systems.

As indicated above, a part of such analysis includes the MID calculator17 determining the transition probabilities from one base to another inthe read going from one M, I, or D state to another in comparison to thereference, such as from a matching state to another matching state, or amatching state to either an insertion state or to a deletion state. Inmaking such determinations each of the associated transitionprobabilities is determined and considered when evaluating whether anyobserved variation between the read and the reference is a truevariation and not just some machine or processing error. For thesepurposes, the Phred score for each base being considered is useful indetermining the transition probabilities in and out of the match state,such as going from a match state to an insert or deletion, e.g., agapped, state in the comparison. Likewise, the transition probabilitiesof continuing a gapped state or going from a gapped state, e.g., aninsert or deletion state, back to a match state are also determined. Inparticular instances, the probabilities in or out of the delete orinsert state, e.g., exiting a gap continuation state, may be a fixedvalue, and may be referenced herein as the gap continuation probabilityor penalty. Nevertheless, in various instances, such gap continuationpenalties may be floating and therefore subject to change dependent onthe accuracy demands of the system configuration.

Accordingly, as depicted with respect to FIGS. 7 and 8 each of the M, I,and D state values are computed for each possible read and haplotypebase pairing. In such an instance, a virtual matrix 30 of cellscontaining the read sequence being evaluated on one axis of the matrixand the associated haplotype sequence on the other axis may be formed,such as where each cell in the matrix represents a base position in theread and haplotype reference. Hence, if the read and haplotype sequencesare each 100 bases in length, the matrix 30 will include 100 by 100cells, a given portion of which may need to be processed in order todetermine the likelihood and/or extent to which this particular readmatches up with this particular reference. Hence, once virtually formed,the matrix 30 may then be used to determine the various statetransitions that take place when moving from one base in the readsequence to another and comparing the same to that of the haplotypesequence, such as depicted in FIGS. 7 and 8. Specifically, theprocessing engine 13 is configured such that a multiplicity of cells maybe processed in parallel and/or sequential fashion when traversing thematrix with the control logic 15. For instance, as depicted in FIG. 7, avirtual processing swath 35 is propagated and moves across and down thematrix 30, such as from left to right, processing the individual cellsof the matrix 30 down the right to left diagonal.

More specifically, as can be seen with respect to FIG. 7, eachindividual virtual cell within the matrix 30 includes an M, I, and Dstate value that needs to be calculated so as to asses the nature of theidentity of the called base, and as depicted in FIG. 7 the datadependencies for each cell in this process may clearly be seen. Hence,for determining a given M state of a present cell being processed, theMatch, Insert, and Delete states of the cell diagonally above thepresent cell need to be pushed into the present cell and used in thecalculation of the M state of the cell presently being calculated (e.g.,thus, the diagonal downwards, forwards progression through the matrix isindicative of matching).

However, for determining the I state, only the Match and Insert statesfor the cell directly above the present cell need be pushed into thepresent cell being processed (thus, the vertical downwards “gapped”progression when continuing in an insertion state). Likewise, fordetermining the D state, only the Match and Delete states for the celldirectly left of the present cell need be pushed into the present cell(thus, the horizontal cross-wards “gapped” progression when continuingin a deletion state). As can be seen with respect to FIG. 7, aftercomputation of cell 1 (the shaded cell in the top most row) begins, theprocessing of cell 2 (the shaded cell in the second row) can also begin,without waiting for any results from cell 1, because there is no datadependencies between this cell in row 2 and the cell of row 1 whereprocessing begins. This forms a reverse diagonal 35 where processingproceeds downwards and to the left, as shown by the red arrow. Thisreverse diagonal 35 processing approach increases the processingefficiency and throughput of the overall system. Likewise, the datagenerated in cell 1, can immediately be pushed forward to the cell downand forward to the right of the top most cell 1, thereby advancing theswath 35 forward.

For instance, FIG. 7 depicts an exemplary HMM matrix structure 35showing the hardware processing flow. The matrix 35 includes thehaplotype base index, e.g., containing 36 bases, positioned to run alongthe top edge of the horizontal axis, and further includes the base readindex, e.g., 10 bases, positioned to fall along the side edge of thevertical axis in such a manner to from a structure of cells where aselection of the cells may be populated with an M, I, and D probabilitystate, and the transition probabilities of transitioning from thepresent state to a neighboring state. In such an instance, as describedin greater detail above, a move from a match state to a match stateresults in a forwards diagonal progression through the matrix 30, whilemoving from a match state to an insertion state results in a verticaldownwards progressing gap, and a move from a match state to a deletionstate results in a horizontal progressing gap. Hence, as depicted inFIG. 8, for a given cell, when determining the match, insert, and deletestates for each cell, the match, insert, and delete probabilities of itsthree adjoining cells are employed.

The downwards arrow in FIG. 7 represents the parallel and sequentialnature of the processing engine(s) that are configured so as to producea processing swath or wave 35 that moves progressively along the virtualmatrix in accordance with the data dependencies, see FIGS. 7 and 8, fordetermining the M, I, and D states for each particular cell in thestructure 30. Accordingly, in certain instances, it may be desirable tocalculate the identities of each cell in a downwards and diagonalmanner, as explained above, rather than simply calculating each cellalong a vertical or horizontal axis exclusively, although this can bedone if desired. This is due to the increased wait time, e.g., latency,that would be required when processing the virtual cells of the matrix35 individually and sequentially along the vertical or horizontal axisalone, such as via the hardware configuration.

For instance, in such an instance, when moving linearly and sequentiallythrough the virtual matrix 30, such as in a row by row or column bycolumn manner, in order to process each new cell the state computationsof each preceding cell would have to be completed, thereby increasinglatency time overall. However, when propagating the M, I, Dprobabilities of each new cell in a downwards and diagonal fashion, thesystem 1 does not have to wait for the processing of its preceding cell,e.g., of row one, to complete before beginning the processing of anadjoining cell in row two of the matrix. This allows for parallel andsequential processing of cells in a diagonal arrangement to occur, andfurther allows the various computational delays of the pipelineassociated with the M, I, and D state calculations to be hidden.Accordingly, as the swath 35 moves across the matrix 30 from left toright, the computational processing moves diagonally downwards, e.g.,towards the left (as shown by the arrow in FIG. 7). This configurationmay be particularly useful for hardware and/or quantum circuitimplementations, such as where the memory and/or clock-by-clock latencyare a primary concern.

In these configurations, the actual value output from each call of anHMM engine 13, e.g., after having calculated the entire matrix 30, maybe a bottom row (e.g., Row 35 of FIG. 21) containing M, I, and D states,where the M and I states may be summed (the D states may be ignored atthis point having already fulfilled their function in processing thecalculations above), so as to produce a final sum value that may be asingle probability that estimates, for each read and haplotype index,the probability of observing the read, e.g., assuming the haplotype wasthe true original DNA sampled.

Particularly, the outcome of the processing of the matrix 30, e.g., ofFIG. 7, may be a single value representing the probability that the readis an actual representation of that haplotype. This probability is avalue between 0 and 1 and is formed by summing all of the M and I statesfrom the bottom row of cells in the HMM matrix 30. Essentially, what isbeing assessed is the possibility that something could have gone wrongin the sequencer, or associated DNA preparation methods prior tosequencing, so as to incorrectly produce a mismatch, insertion, ordeletion into the read that is not actually present within the subject'sgenetic sequence. In such an instance, the read is not a true reflectionof the subject's actual DNA.

Hence, accounting for such production errors, it can be determined whatany given read actually represents with respect to the haplotype, andthereby allows the system to better determine how the subject's geneticsequence, e.g., en masse, may differ from that of a reference sequence.For instance, many haplotypes may be run against many read sequences,generating scores for all of them, and determining based on whichmatches have the best scores, what the actual genomic sequence identityof the individual is and/or how it truly varies from a reference genome.

More particularly, FIG. 8 depicts an enlarged view of a portion of theHMM state matrix 30 from FIG. 7. As shown in FIG. 8, given the internalcomposition of each cell in the matrix 30, as well as the structure ofthe matrix as a whole, the M, I, and D state probability for any given“new” cell being calculated is dependent on the M, I, and D states ofseveral of its surrounding neighbors that have already been calculated.Particularly, as shown in greater detail with respect to FIGS. 1 and 16,in an exemplary configuration, there may be an approximately a 0.9998probability of going from a match state to another match state, andthere may be only a 0.0001 probability (gap open penalty) of going froma match state to either an insertion or a deletion, e.g., gapped, state.Further, when in either a gapped insertion or gapped deletion statethere may be only a 0.1 probability (gap extension or continuationpenalty) of staying in that gapped state, while there is a 0.9probability of returning to a match state. It is to be noted thataccording to this model, all of the probabilities in to or out of agiven state should sum to one. Particularly, the processing of thematrix 30 revolves around calculating the transition probabilities,accounting for the various gap open or gap continuation penalties and afinal sum is calculated.

Hence, these calculated state transition probabilities are derivedmainly from the directly adjoining cells in the matrix 30, such as fromthe cells that are immediately to the left of, the top of, anddiagonally up and left of that given cell presently being calculated, asseen in FIG. 16. Additionally, the state transition probabilities may inpart be derived from the “Phred” quality score that accompanies eachread base. These transition probabilities, therefore, are useful incomputing the M, I, and D state values for that particular cell, andlikewise for any associated new cell being calculated. It is to be notedthat as described herein, the gap open and gap continuation penaltiesmay be fixed values, however, in various instances, the gap open and gapcontinuation penalties may be variable and therefore programmable withinthe system, albeit by employing additional hardware resources dedicatedto determining such variable transition probability calculations. Suchinstances may be useful where greater accuracy is desired. Nevertheless,when such values are assumed to be constant, smaller resource usageand/or chip size may be achieved, leading to greater processing speed,as explained below.

Accordingly, there is a multiplicity of calculations and/or othermathematical computations, such as multiplications and/or additions,which are involved in deriving each new M, I, and D state value. In suchan instance, such as for calculating maximum throughput, the primitivemathematical computations involved in each M, I, and D transition statecalculation may be pipelined. Such pipelining may be configured in a waythat the corresponding clock frequencies are high, but where thepipeline depth may be non-trivial. Further, such a pipeline may beconfigured to have a finite depth, and in such instances it may takemore than one clock cycle to complete the operations.

For instance, these computations may be run at high speeds inside theprocessor 7, such as at about 300 MHz. This may be achieved such as bypipelining the FPGA or ASIC heavily with registers so littlemathematical computation occurs between each flip-flop. This pipelinestructure results in multiple cycles of latency in going from the inputof the match state to the output, but given the reverse diagonalcomputing structure, set forth in FIG. 7 above, these latencies may behidden over the entire HMM matrix 30, such as where each cell representsone clock cycle.

Hence, the number of M, I, and D state calculations may be limited. Insuch an instance, the processing engine 13 may be configured in such amanner that a grouping, e.g., swath 35, of cells in a number of rows ofthe matrix 30 may be processed as a group (such as in adown-and-left-diagonal fashion as illustrated by the arrow in FIG. 7)before proceeding to the processing of a second swath below, e.g., wherethe second swath contains the same number of cells in rows to beprocessed as the first. In a manner such as this, a hardwareimplementation of an accelerator 8, as described herein, may be adaptedso as to make the overall system more efficient, as described above.

Particularly, FIG. 9 sets forth an exemplary computational structure forperforming the various state processing calculations herein described.More particularly, FIG. 9 sets forth three dedicated logic blocks 17 ofthe processing engine 13 for computing the state computations involvedin generating each M, I, and D state value for each particular cell, orgrouping of cells, being processed in the HMM matrix 30. These logicblocks may be implemented in hardware, but in some instances, may beimplemented in software, such as for being performed by one or morequantum circuits. As can be seen with respect to FIG. 9, the match statecomputation 15 a is more involved than either of the insert 15 b ordeletion 15 c computations, this is because in calculating the matchstate 15 a of the present cell being processed, all of the previousmatch, insert, and delete states of the adjoining cells along withvarious “Priors” data are included in the present match computation (seeFIGS. 9 and 10), whereas only the match and either the insert and deletestates are included in their respective calculations. Hence, as can beseen with respect to FIG. 9, in calculating a match state, three statemultipliers, as well as two adders, and a final multiplier, whichaccounts for the Prior, e.g. Phred, data are included. However, forcalculating the I or D state, only two multipliers and one adder areincluded. It is noted that in hardware, multipliers are more resourceintensive than adders.

Accordingly, to various extents, the M, I, and D state values forprocessing each new cell in the HMM matrix 30 uses the knowledge orpre-computation of the following values, such as the “previous” M, I,and D state values from left, above, and/or diagonally left and above ofthe currently-being-computed cell in the HMM matrix. Additionally, suchvalues representing the prior information, or “Priors”, may at least inpart be based on the “Phred” quality score, and whether the read baseand the reference base at a given cell in the matrix 30 match or aredifferent. Such information is particularly useful when determining amatch state. Specifically, as can be seen with respect to FIG. 9, insuch instances, there are basically seven “transition probabilities”(M-to-M, I-to-M, D-to-M, I-to-I, M-to-I, D-to-D, and M-to-D) thatindicate and/or estimate the probability of seeing a gap open, e.g., ofseeing a transition from a match state to an insert or delete state;seeing a gap close; e.g., going from an insert or delete state back to amatch state; and seeing the next state continuing in the same state asthe previous state, e.g., Match-to-Match, Insert-to-Insert,Delete-to-Delete.

The state values (e.g., in any cell to be processed in the HMM matrix30), Priors, and transition probabilities are all values in the range of[0, 1]. Additionally, there are also known starting conditions for cellsthat are on the left or top edge of the HMM matrix 30. As can be seenfrom the logic 15 a of FIG. 9, there are four multiplication and twoaddition computations that may be employed in the particular M statecalculation being determined for any given cell being processed.Likewise, as can be seen from the logic of 15 b and 15 c there are twomultiplications and one addition involved for each I state and each Dstate calculation, respectively. Collectively, along with the priorsmultiplier this sums to a total of eight multiplications and fouraddition operations for the M, I, and D state calculations associatedwith each single cell in the HMM matrix 8 to be processed.

The final sum output, e.g., row 34 of FIG. 16, of the computation of thematrix 30, e.g., for a single job 20 of comparing one read to one or twohaplotypes, is the summation of the final M and I states across theentire bottom row 34 of the matrix 30, which is the final sum value thatis output from the HMM accelerator 8 and delivered to the CPU/GPU/QPU1000. This final summed value represents how well the read matches thehaplotype(s). The value is a probability, e.g., of less than one, for asingle job 20 a that may then be compared to the output resulting fromanother job 20 b such as form the same active region 500. It is notedthat there are on the order of 20 trillion HMM cells to evaluate in a“typical” human genome at 30X coverage, where these 20 trillion HMMcells are spread across about 1 to 2 billion HMM matrices 30 of allassociated HMM jobs 20.

The results of such calculations may then be compared one against theother so as to determine, in a more precise manner, how the geneticsequence of a subject differs, e.g., on a base by base comparison, fromthat of one or more reference genomes. For the final sum calculation,the adders already employed for calculating the M, I, and/or D states ofthe individual cells may be re-deployed so as to compute the final sumvalue, such as by including a mux into a selection of the re-deployedadders thereby including one last additional row, e.g., with respect tocalculation time, to the matrix so as to calculate this final sum, whichif the read length is 100 bases amounts to about a 1% overhead. Inalternative embodiments, dedicated hardware resources can be used forperforming such calculations. In various instances, the logic for theadders for the M and D state calculations may be deployed forcalculating the final sum, which D state adder may be efficientlydeployed since it is not otherwise being used in the final processingleading to the summing values.

In certain instances, these calculations and relevant processes may beconfigured so as to correspond to the output of a given sequencingplatform, such as including an ensemble of sequencers, which as acollective may be capable of outputting (on average) a new human genomeat 30× coverage every 28 minutes (though they come out of the sequencerensemble in groups of about 150 genomes every three days). In such aninstance, when the present mapping, aligning, and variant callingoperations are configured to fit within such a sequencing platform ofprocessing technologies, a portion of the 28 minutes (e.g., about 10minutes) it takes for the sequencing cluster to sequence a genome, maybe used by a suitably configured mapper and/or aligner, as hereindescribed, so as to take the image/BCL/FASTQ file results from thesequencer and perform the steps of mapping and/or aligning the genome,e.g., post-sequencer processing. That leaves about 18 minutes of thesequencing time period for performing the variant calling step, of whichthe HMM operation is the main computational component, such as prior tothe nucleotide sequencer sequencing the next genome, such as over thenext 28 minutes. Accordingly, in such instances, 18 minutes may bebudgeted to computing the 20 trillion HMM cells that need to beprocessed in accordance with the processing of a genome, such as whereeach of the HMM cells to be processed includes about twelve mathematicaloperations (e.g., eight multiplications and/or four additionoperations). Such a throughput allows for the following computationaldynamics (20 trillion HMM cells)×(12 math ops per cell)/(18 minutes×60seconds/minute), which is about 222 billion operations per second ofsustained throughput.

FIG. 10 sets forth the logic blocks 17 of the processing engine of FIG.9 including exemplary M, I, and D state update circuits that present asimplification of the circuit provided in FIG. 9. The system may beconfigured so as to not be memory-limited, so a single HMM engineinstance 13 (e.g., that computes all of the single cells in the HMMmatrix 30 at a rate of one cell per clock cycle, on average, plusoverheads) may be replicated multiple times (at least 65-70 times tomake the throughput efficient, as described above). Nevertheless, tominimize the size of the hardware, e.g., the size of the chip 2 and/orits associated resource usage, and/or in a further effort to include asmany HMM engine instances 13 on the chip 2 as desirable and/or possible,simplifications may be made with regard to the logic blocks 15 a′-c′ ofthe processing instance 13 for computing one or more of the transitionprobabilities to be calculated.

In particular, it may be assumed that the gap open penalty (GOP) and gapcontinuation penalty (GCP), as described above, such as for inserts anddeletes are the same and are known prior to chip configuration. Thissimplification implies that the I-to-M and D-to-M transitionprobabilities are identical. In such an instance, one or more of themultipliers, e.g., set forth in FIG. 9, may be eliminated, such as bypre-adding I and D states before multiplying by a common Indel-to-Mtransition probability. For instance, in various instances, if the I andD state calculations are assumed to be the same, then the statecalculations per cell can be simplified as presented in FIG. 10.Particularly, if the I and D state values are the same, then the I stateand the D state may be added and then that sum may be multiplied by asingle value, thereby saving a multiply. This may be done because, asseen with respect to FIG. 10, the gap continuation and/or closepenalties for the I and D states are the same. However, as indicatedabove, the system can be configured to calculate different values forboth the I and D transition state probabilities, and in such aninstance, this simplification would not be employed.

Additionally, in a further simplification, rather than dedicate chip orother computing resources configured specifically to perform the finalsum operation at the bottom of the HMM matrix, the present HMMaccelerator 8 may be configured so as to effectively append one or moreadditional rows to the HMM matrix 30, with respect to computationaltime, e.g., overhead, it takes to perform the calculation, and may alsobe configured to “borrow” one or more adders from the M-state 15 a andD-state 15 c computation logic such as by MUXing in the final sum valuesto the existing adders as needed, so as to perform the actual finalsumming calculation. In such an instance, the final logic, including theM logic 15 a, I logic 15 b, and D logic 15 c blocks, which blockstogether form part of the HMM MID instance 17, may include 7 multipliersand 4 adders along with the various MUXing involved.

Accordingly, FIG. 10 sets forth the M, I, and D state update circuits 15a′, 15 b′, and 15 c′ including the effects of simplifying assumptionsrelated to transition probabilities, as well as the effect of sharingvarious M, I, and/or D resources, e.g., adder resources, for the finalsum operations. A delay block may also be added to the M-state path inthe M-state computation block, as shown in FIG. 10. This delay may beadded to compensate for delays in the actual hardware implementations ofthe multiply and addition operations, and/or to simplify the controllogic, e.g., 15.

As shown in FIGS. 9 and 10, these respective multipliers and/or addersmay be floating point multipliers and adders. However, in variousinstances, as can be seen with respect to FIG. 11, a log domainconfiguration may be implemented where in such configuration all of themultiplies turn into adds. FIG. 11 presents what log domain calculationwould look like if all the multipliers turned into adders, e.g., 15 a″,15 b″, and 15 c″, such as occurs when employing a log domaincomputational configuration. Particularly, all of the multiplier logicturns into an adder, but the adder itself turns into or otherwiseincludes a function where the function such as:f(a,b)=max(a,b)−log₂(1+2̂(−[a−b]), such as where the log portion of theequation may be maintained within a LUT whose depth and physical size isdetermined by the precision required.

Given the typical read and haplotype sequence lengths as well as thevalues typically seen for read quality (Phred) scores and for therelated transition probabilities, the dynamic range requirements on theinternal HMM state values may be quite severe. For instance, whenimplementing the HMM module in software, various of the HMM jobs 20 mayresult in underruns, such as when implemented on single-precision(32-bit) floating-point state values. This implies a dynamic range thatis greater than 80 powers of 10, thereby requiring the variant callsoftware to bump up to double-precision (64-bit) floating point statevalues. However, full 64-bit double-precision floating-pointrepresentation may, in various instances, have some negativeimplications, such as if compact, high-speed hardware is to beimplemented, both storage and compute pipeline resource requirementswill need to be increased, thereby occupying greater chip space, and/orslowing timing. In such instances, a fixed-point-only linear-domainnumber representation may be implemented. Nevertheless, the dynamicrange demands on the state values, in this embodiment, make the bitwidths involved in certain circumstances less than desirable.Accordingly, in such instances, fixed-point-only log-domain numberrepresentation may be implemented, as described herein.

In such a scheme, as can be seen with respect to FIG. 11, instead ofrepresenting the actual state value in memory and computations, the-log-base-2 of the number may be represented. This may have severaladvantages, including employing multiply operations in linear space thattranslate into add operations in log space; and/or this log domainrepresentation of numbers inherently supports wider dynamic range withonly small increases in the number of integer bits. These log-domain M,I, D state update calculations are set forth in FIGS. 11 and 12.

As can be seen when comparing the logic 17 configuration of FIG. 11 withthat of FIG. 9, the multiply operations go away in the log-domain.Rather, they are replaced by add operations, and the add operations aremorphed into a function that can be expressed as a max operationfollowed by a correction factor addition, e.g., via a LUT, where thecorrection factor is a function of the difference between the two valuesbeing summed in the log-domain. Such a correction factor can be eithercomputed or generated from the look-up-table. Whether a correctionfactor computation or look-up-table implementation is more efficient tobe used depends on the required precision (bit width) on the differencebetween the sum values. In particular instances, therefore, the numberof log-domain bits for state representation can be in the neighborhoodof 8 to 12 integer bits plus 6 to 24 fractional bits, depending on thelevel of quality desired for any given implementation. This impliessomewhere between 14 and 36 bits total for log-domain state valuerepresentation. Further, it has been determined that there arelog-domain fixed-point representations that can provide acceptablequality and acceptable hardware size and speed.

In various instances, one read sequence is typically processed for eachHMM job 20, which as indicated may include a comparison against twohaplotype sequences. And like above for the haplotype memory, aping-pong structure may also be used in the read sequence memory 18 toallow various software implemented functions the ability to write newHMM job information 20 b while a current job 20 a is still beingprocessed by the HMM engine instance 13. Hence, a read sequence storagerequirement may be for a single 1024×32 two-port memory (such as oneport for write, one port for read, and/or separate clocks for write andread ports).

Particularly, as described above, in various instances, the architectureemployed by the system 1 is configured such that in determining whethera given base in a sequenced sample genome matches that of acorresponding base in one or more reference genomes, a virtual matrix 30is formed, wherein the reference genome is theoretically set across ahorizontal axis, while the sequenced reads, representing the samplegenome, is theoretically set in descending fashion down the verticalaxis. Consequently, in performing an HMM calculation, the HMM processingengine 13, as herein described, is configured to traverse this virtualHMM matrix 30. Such processing can be depicted as in FIG. 7, as a swath35 moving diagonally down and across the virtual array performing thevarious HMM calculations for each cell of the virtual array, as seen inFIG. 8.

More particularly, this theoretical traversal involves processing afirst grouping of rows of cells 35 a from the matrix 30 in its entirety,such as for all haplotype and read bases within the grouping, beforeproceeding down to the next grouping of rows 35 b (e.g., the next groupof read bases). In such an instance, the M, I, and D state values forthe first grouping are stored at the bottom edge of that initialgrouping of rows so that these M, I, and D state values can then be usedto feed the top row of the next grouping (swath) down in the matrix 30.In various instances, the system 1 may be configured to allow up to 1008length haplotypes and/or reads in the HMM accelerator 8, and since thenumerical representation employs W-bits for each state, this implies a1008 word×W-bit memory for M, I, and D state storage.

Accordingly, as indicated, such memory could be either a single-port ordouble-port memory. Additionally, a cluster-level, scratch pad memory,e.g., for storing the results of the swath boundary, may also beprovided. For instance, in accordance with the disclosure above, thememories discussed already are configured for a per-engine-instance 13basis. In particular HMM implementations, multiple engine instances 13a− _((n+1)) may be grouped into a cluster 11 that is serviced by asingle connection, e.g., PCIe bus 5, to the PCIe interface 4 and DMA 3via CentCom 9. Multiple clusters 11 a− _((n+1)) can be instantiated soas to more efficiently utilize PCIe bandwidth using the existing CentCom9 functionality.

Hence, in a typical configuration, somewhere between 16 and 64 engines13 _(m) are instantiated within a cluster 11 _(n), and one to fourclusters might be instantiated in a typical FPGA/ASIC implementation ofthe HMM 8 (e.g., depending on whether it is a dedicated HMM FPGA imageor whether the HMM has to share FPGA real estate with thesequencer/mapper/aligner and/or other modules, as herein disclosed). Inparticular instances, there may be a small amount of memory used at thecluster-level 11 in the HMM hardware. This memory may be used as anelastic First In First Out (“FIFO”) to capture output data from the HMMengine instances 13 in the cluster and pass it on to CentCom 9 forfurther transmittal back to the software of the CPU 1000 via the DMA 3and PCIe 4. In theory, this FIFO could be very small (on the order oftwo 32-bit words), as data are typically passed on to CentCom 9 almostimmediately after arriving in the FIFO. However, to absorb potentialdisrupts in the output data path, the size of this FIFO may be madeparametrizable. In particular instances, the FIFO may be used with adepth of 512 words. Thus, the cluster-level storage requirements may bea single 512×32 two-port memory (separate read and write ports, sameclock domain).

FIG. 12 sets forth the various HMM state transitions 17 b depicting therelationship between Gap Open Penalties (GOP), Gap Close Penalties(GCP), and transition probabilities involved in determining whether andhow well a given read sequence matches a particular haplotype sequence.In performing such an analysis, the HMM engine 13 includes at leastthree logic blocks 17 b, such as a logic block for determining a matchstate 15 a, a logic block for determining an insert state 15 b, and alogic block for determining a delete state 15 c. These M, I, and D statecalculation logic 17 when appropriately configured function efficientlyto avoid high-bandwidth bottlenecks, such as of the HMM computationalflow. However, once the M, I, D core computation architecture isdetermined, other system enhancements may also be configured andimplemented so as to avoid the development of other bottlenecks withinthe system.

Particularly, the system 1 may be configured so as to maximize theprocess of efficiently feeding information from the computing core 1000to the variant caller module 2 and back again, so as not to produceother bottlenecks that would limit overall throughput. One such blockthat feeds the HMM core M, I, D state computation logic 17 is thetransition probabilities and priors calculation block. For instance, ascan be seen with respect to FIG. 9, each clock cycle employs thepresentation of seven transition probabilities and one Prior at theinput to the M, I, D state computation block 15 a. However, after thesimplifications that result in the architecture of FIG. 10, only fourunique transition probabilities and one Prior are employed for eachclock cycle at the input of the M, I, D state computation block.Accordingly, in various instances, these calculations may be simplifiedand the resulting values generated. Thus, increasing throughput,efficiency, and reducing the possibility of a bottleneck forming at thisstage in the process.

Additionally, as described above, the Priors are values generated viathe read quality, e.g., Phred score, of the particular base beinginvestigated and whether, or not, that base matches the hypothesishaplotype base for the current cell being evaluated in the virtual HMMmatrix 30. The relationship can be described via the equations bellow:First, the read Phred in question may be expressed as aprobability=10̂(−(read Phred/10)). Then the Prior can be computed basedon whether the read base matches the hypothesis haplotype base: If theread base and hypothesis haplotype base match: Prior=1−read Phredexpressed as a probability. Otherwise: Prior=(read Phred expressed asprobability)/3. The divide-by-three operation in this last equationreflects the fact that there are only four possible bases (A, C, G, T).Hence, if the read and haplotype base did not match, then it must be oneof the three remaining possible bases that does match, and each of thethree possibilities is modeled as being equally likely.

The per-read-base Phred scores are delivered to the HMM hardwareaccelerator 8 as 6-bit values. The equations to derive the Priors, then,have 64 possible outcomes for the “match” case and an additional 64possible outcomes for the “don't match” case. This may be efficientlyimplemented in the hardware as a 128 word look-up-table, where theaddress into the look-up-table is a 7-bit quantity formed byconcatenating the Phred value with a single bit that indicates whether,or not, the read base matches the hypothesis haplotype base.

Further, with respect to determining the match to insert and/or match todelete probabilities, in various implementations of the architecture forthe HMM hardware accelerator 8, separate gap open penalties (GOP) can bespecified for the Match-to-Insert state transition, and theMatch-to-Delete state transition, as indicated above. This equates tothe M2I and M2D values in the state transition diagram of FIG. 12 beingdifferent. As the GOP values are delivered to the HMM hardwareaccelerator 8 as 6-bit Phred-like values, the gap open transitionprobabilities can be computed in accordance with the followingequations: M2I transition probability=10̂(−(read GOP(I)/10)) and M2Dtransition probability=10̂(−(read GOP(D)/10)). Similar to the Priorsderivation in hardware, a simple 64 word look-up-table can be used toderive the M2I and M2D values. If GOP(I) and GOP(D) are inputted to theHMM hardware 8 as potentially different values, then two suchlook-up-tables (or one resource-shared look-up-table, potentiallyclocked at twice the frequency of the rest of the circuit) may beutilized.

Furthermore, with respect to determining match to match transitionprobabilities, in various instances, the match-to-match transitionprobability may be calculated as: M2M transition probability=1−(M2Itransition probability+M2D transition probability). If the M2I and M2Dtransition probabilities can be configured to be less than or equal to avalue of ½, then in various embodiments the equation above can beimplemented in hardware in a manner so as to increase overall efficiencyand throughput, such as by reworking the equation to be: M2M transitionprobability=(0.5−M2I transition probability)+(0.5−M2D transitionprobability). This rewriting of the equation allows M2M to be derivedusing two 64 element look-up-tables followed by an adder, where thelook-up-tables store the results.

Further still, with respect to determining the Insert to Insert and/orDelete to Delete transition probabilities, the I2I and D2D transitionprobabilities are functions of the gap continuation probability (GCP)values inputted to the HMM hardware accelerator 8. In various instances,these GCP values may be 6-bit Phred-like values given on a per-read-basebasis. The I2I and D2D values may then be derived as shown: I2Itransition probability=10̂(−(read GCP(I)/10)), and D2D transitionprobability=10̂(−(read GCP(D)/10)). Similar to some of the othertransition probabilities discussed above, the I2I and D2D values may beefficiently implemented in hardware, and may include two look-up-tables(or one resource-shared look-up-table), such as having the same form andcontents as the Match-to-Indel look-up-tables discussed previously. Thatis, each look-up-table may have 64 words.

Additionally, with respect to determining the Inset and/or Delete toMatch probabilities, the I2M and D2M transition probabilities arefunctions of the gap continuation probability (GCP) values and may becomputed as: I2M transition probability=1−I2I transition probability,and D2M transition probability=1−D2D transition probability, where theI2I and D2D transition probabilities may be derived as discussed above.A simple subtract operation to implement the equations above may be moreexpensive in hardware resources than simply implementing another 64 wordlook-up-table and using two copies of it to implement the I2M and D2Mderivations. In such instances, each look-up-table may have 64 words. Ofcourse, in all relevant embodiments, simple or complex subtractoperations may be formed with the suitably configured hardware.

FIG. 13 provides the circuitry 17 a for a simplified calculation for HMMtransition probabilities and Priors, as described above, which supportsthe general state transition diagram of FIG. 12. As can be seen withrespect to FIG. 13, in various instances, a simple HMM hardwareaccelerator architecture 17 a is presented, which accelerator may beconfigured to include separate GOP values for Insert and Deletetransitions, and/or there may be separate GCP values for Insert andDelete transitions. In such an instance, the cost of generating theseven unique transition probabilities and one Prior each clock cycle maybe configured as set forth below: eight 64 word look-up-tables, one 128word look-up-table, and one adder.

Further, in various instances, the hardware 2, as presented herein, maybe configured so as to fit as many HMM engine instances 13 as possibleonto the given chip target (such as on an FPGA, sASIC, or ASIC). In suchan instance, the cost to implement the transition probabilities andpriors generation logic 17 a can be substantially reduced relative tothe costs as provided by the below configurations. Firstly, rather thansupporting a more general version of the state transitions, such as setforth in FIG. 13, e.g., where there may be separate values for GOP(I)and GOP(D), rather, in various instances, it may be assumed that the GOPvalues for insert and delete transitions are the same for a given base.This results in several simplifications to the hardware, as indicatedabove.

In such instances, only one 64 word look-up-table may be employed so asto generate a single M2Indel value, replacing both the M2I and M2Dtransition probability values, whereas two tables are typically employedin the more general case. Likewise, only one 64 word look-up-table maybe used to generate the M2M transition probability value, whereas twotables and an add may typically be employed in the general case, as M2Mmay now be calculated as 1−2×M2Indel.

Secondly, the assumption may be made that the sequencer-dependent GCPvalue for both insert and delete are the same AND that this value doesnot change over the course of an HMM job 20. This means that: a singleIndel2Indel transition probability may be calculated instead of separate121 and D2D values, using one 64 word look-up-table instead of twotables; and single Indel2Match transition probability may be calculatedinstead of separate I2M and D2M values, using one 64 word look-up-tableinstead of two tables.

Additionally, a further simplifying assumption can be made that assumesthe Inset2Insert and Delete2Delete (I2I and D2D) and Insert2Match andDelete2Match (I2M and D2M) values are not only identical between insertand delete transitions, but may be static for the particular HMM job 20.Thus, the four look-up-tables associated in the more generalarchitecture with I2I, D2D, I2M, and D2M transition probabilities can beeliminated altogether. In various of these instances, the staticIndel2Indel and Indel2Match probabilities could be made to be enteredvia software or via an RTL parameter (and so would be bitstreamprogrammable in an FPGA). In certain instances, these values may be madebitstream-programmable, and in certain instances, a training mode may beimplemented employing a training sequence so as to further refinetransition probability accuracy for a given sequencer run or genomeanalysis.

FIG. 14 sets forth what the new state transition 17 b diagram may looklike when implementing these various simplifying assumptions.Specifically, FIG. 22 sets forth the simplified HMM state transitiondiagram depicting the relationship between GOP, GCP, and transitionprobabilities with the simplifications set forth above.

Likewise, FIG. 15 sets forth the circuitry 17 a,b for the HMM transitionprobabilities and priors generation, which supports the simplified statetransition diagram of FIG. 14. As seen with respect to FIG. 15, acircuit realization of that state transition diagram is provided. Thus,in various instances, for the HMM hardware accelerator 8, the cost ofgenerating the transition probabilities and one Prior each clock cyclereduces to: Two 64 word look-up-tables, and One 128 word look-up-table.

As set forth above, the engine control logic 15 is configured forgenerating the virtual matrix and/or traversing the matrix so as toreach the edge of the swath, e.g., via high-level engine state machines,where result data may be finally summed, e.g., via final sum controllogic 19, and stored, e.g., via put/get logic.

Accordingly, as can be seen with respect to FIG. 16, in variousembodiments, a method for producing and/or traversing an HMM cell matrix30 is provided. Specifically, FIG. 16 sets forth an example of how theHMM accelerator control logic 15 goes about traversing the virtual cellsin the HMM matrix. For instance, assuming for exemplary purposes, a 5clock cycle latency for each multiply and each add operation, theworst-case latency through the M, I, D state update calculations wouldbe the 20 clock cycles it would take to propagate through the M updatecalculation. There are half as many operations in the I and D stateupdate calculations, implying a 10 clock cycle latency for thoseoperations.

These latency implications of the M, I, and D compute operations can beunderstood with respect to FIG. 16, which sets forth various examples ofthe cell-to-cell data dependencies. In such instances, the M and D stateinformation of a given cell feed the D state computations of the cell inthe HMM matrix that is immediately to the right (e.g., having the sameread base as the given cell, but having the next haplotype base).Likewise, the M and I state information for the given cell feed the Istate computations of the cell in the HMM matrix that is immediatelybelow (e.g., having the same haplotype base as the give cell, but havingthe next read base). So, in particular instances, the M, I, and D statesof a given cell feed the D and I state computations of cells in the nextdiagonal of the HMM cell matrix.

Similarly, the M, I, and D states of a given cell feed the M statecomputation of the cell that is to the right one and down one (e.g.,having both the next haplotype base AND the next read base). This cellis actually two diagonals away from the cell that feeds it (whereas, theI and D state calculations rely on states from a cell that is onediagonal away). This quality of the I and D state calculations relyingon cells one diagonal away, while the M state calculations rely on cellstwo diagonals away, has a beneficial result for hardware design.

Particularly, given these configurations, I and D state calculations maybe adapted to take half as long (e.g., 10 cycles) as the M statecalculations (e.g., 20 cycles). Hence, if M state calculations arestarted 10 cycles before I and D state calculations for the same cell,then the M, I, and D state computations for a cell in the HMM matrix 30will all complete at the same time. Additionally, if the matrix 30 istraversed in a diagonal fashion, such as having a swath 35 of about 10cells each within it (e.g., that spans ten read bases), then: The M andD states produced by a given cell at (hap, rd) coordinates (i, j) can beused by cell (i+1, j) D state calculations as soon as they are all theway through the compute pipeline of the cell at (i, j).

The M and I states produced by a given cell at (hap, rd) coordinates (i,j) can be used by cell (i, j+1) I state calculations one clock cycleafter they are all the way through the compute pipeline of the cell at(i, j). Likewise, the M, I and D states produced by a given cell at(hap, rd) coordinates (i, j) can be used by cell (i+1, j+1) M statecalculations one clock cycle after they are all the way through thecompute pipeline of the cell at (i, j). Taken together, the above pointsestablish that very little dedicated storage is needed for the M, I, andD states along the diagonal of the swath path that spans the swathlength, e.g., of ten reads. In such an instance, just the registersrequired to delay cell (i, j) M, I, and D state values one clock cyclefor use in cell (i+1, j+1) M calculations and cell (i, j+1) Icalculations by one clock cycle). Moreover, there is somewhat of avirtuous cycle here as the M state computations for a given cell arebegun 10 clock cycles before the I and D state calculations for thatsame cell, natively outputting the new M, I, and D states for any givencell simultaneously.

In view of the above, and as can be seen with respect to FIG. 16, theHMM accelerator control logic 15 may be configured to process the datawithin each of the cells of the virtual matrix 30 in a manner so as totraverse the matrix. Particularly, in various embodiments, operationsstart at cell (0,0), with M state calculations beginning 10 clock cyclesbefore I and D state calculations begin. The next cell to traverseshould be cell (1,0). However, there is a ten cycle latency after thestart of I and D calculations before the results from cell (0,0) will beavailable. The hardware, therefore, inserts nine “dead” cycles into thecompute pipeline. These are shown as the cells with haplotype index lessthan zero in FIG. 16.

After completing the dead cycle that has an effective cell position inthe matrix of (−9,−9), the M, I, and D state values for cell (0,0) areavailable. These (e.g., the M and D state outputs of cell (0,0)) may nowbe used straight away to start the D state computations of cell (0,1).One clock cycle later, the M, I, and D state values from cell (0,0) maybe used to begin the I state computations of cell (0,1) and the M statecomputations of cell (1,1).

The next cell to be traversed may be cell (2,0). However, there is a tencycle latency after the start of I and D calculations before the resultsfrom cell (1,0) will be available. The hardware, therefore, insertseight dead cycles into the compute pipeline. These are shown as thecells with haplotype index less than zero, as in FIG. 16 along the samediagonal as cells (1,0) and (0,1). After completing the dead cycle thathas an effective cell position in the matrix of (−8, −9), the M, I, andD state values for cell (1,0) are available. These (e.g., the M and Dstate outputs of cell (1,0)) are now used straight away to start the Dstate computations of cell (2,0).

One clock cycle later, the M, I, and D state values from cell (1,0) maybe used to begin the I state computations of cell (1,1) and the M statecomputations of cell (2,1). The M and D state values from cell (0,1) maythen be used at that same time to start the D state calculations of cell(1,1). One clock cycle later, the M, I, and D state values from cell(0,1) are used to begin the I state computations of cell (0,2) and the Mstate computations of cell (1,2).

Now, the next cell to traverse may be cell (3,0). However, there is aten-cycle latency after the start of I and D calculations before theresults from cell (2,0) will be available. The hardware, therefore,inserts seven dead cycles into the compute pipeline. These are againshown as the cells with haplotype index less than zero in FIG. 16 alongthe same diagonal as cells (2,0), (1,1), and (0,2). After completing thedead cycle that has an effective cell position in the matrix of (−7,−9),the M, I, and D state values for cell (2,0) are available. These (e.g.,the M and D state outputs of cell (2,0)) are now used straight away tostart the D state computations of cell (3,0). And, so, computation foranother ten cells in the diagonal begins.

Such processing may continue until the end of the last full diagonal inthe swath 35 a, which, in this example (that has a read length of 35 andhaplotype length of 14), will occur after the diagonal that begins withthe cell at (hap, rd) coordinates of (13,0) is completed. After the cell(4,9) in FIG. 16 is traversed, the next cell to traverse should be cell(13,1). However, there is a ten-cycle latency after the start of the Iand D calculations before the results from cell (12,1) will beavailable.

The hardware may be configured, therefore, to start operationsassociated with the first cell in the next swath 35 b, such as atcoordinates (0, 10). Following the processing of cell (0, 10), then cell(13, 1) can be traversed. The whole diagonal of cells beginning withcell (13, 1) is then traversed until cell (5, 9) is reached. Likewise,after the cell (5, 9) is traversed, the next cell to traverse should becell (13, 2). However, as before there may be a ten cycle latency afterthe start of I and D calculations before the results from cell (12, 2)will be available. Hence, the hardware may be configured to startoperations associated with the first cell in the second diagonal of thenext swath 35 b, such as at coordinates (1, 10), followed by cell (0,11).

Following the processing of cell (0, 11), the cell (13, 2) can betraversed, in accordance with the methods disclosed above. The wholediagonal 35 of cells beginning with cell (13,2) is then traversed untilcell (6, 9) is reached. Additionally, after the cell (6, 9) istraversed, the next cell to be traversed should be cell (13, 3).However, here again there may be a ten-cycle latency period after thestart of the I and D calculations before the results from cell (12, 3)will be available. The hardware, therefore, may be configured to startoperations associated with the first cell in the third diagonal of thenext swath 35 c, such as at coordinates (2, 10), followed by cells (1,11) and (0, 12), and likewise.

This continues as indicated, in accordance with the above until the lastcell in the first swath 35 a (the cell at (hap, rd) coordinates (13, 9))is traversed, at which point the logic can be fully dedicated totraversing diagonals in the second swath 35 b, starting with the cell at(9, 10). The pattern outlined above repeats for as many swaths of 10reads as necessary, until the bottom swath 35 c (those cells in thisexample that are associated with read bases having index 30, or greater)is reached.

In the bottom swath 35, more dead cells may be inserted, as shown inFIG. 16 as cells with read indices greater than 35 and with haplotypeindices greater than 13. Additionally, in the final swath 35 c, anadditional row of cells may effectively be added. These cells areindicated at line 35 in FIG. 16, and relate to a dedicated clock cyclein each diagonal of the final swath where the final sum operations areoccurring. In these cycles, the M and I states of the cell immediatelyabove are added together, and that result is itself summed with arunning final sum (that is initialized to zero at the left edge of theHMM matrix 30).

Taking the discussion above as context, and in view of FIG. 16, it ispossible to see that, for this example of read length of 35 andhaplotype length of 14, there are 102 dead cycles, 14 cycles associatedwith final sum operations, and 20 cycles of pipeline latency, for atotal of 102+14+20=146 cycles of overhead. It can also be seen that, forany HMM job 20 with a read length greater than 10, the dead cycles inthe upper left corner of FIG. 16 are independent of read length. It canalso be seen that the dead cycles at the bottom and bottom right portionof FIG. 16 are dependent on read length, with fewest dead cycles forreads having mod(read length, 10)=9 and most dead cycles for mod(readlength, 10)=0. It can further be seen that the overhead cycles becomesmaller as a total percentage of HMM matrix 30 evaluation cycles as thehaplotype lengths increase (bigger matrix, partially fixed number ofoverhead cycles) or as the read lengths increase (note: this refers tothe percentage of overhead associated with the final sum row in thematrix being reduced as read length−row-count−increases). Using suchhistogram data from representative whole human genome runs, it has beendetermined that traversing the HMM matrix in the manner described aboveresults in less than 10% overhead for the whole genome processing.

Further methods may be employed to reduce the amount of overhead cyclesincluding: Having dedicated logic for the final sum operations ratherthan sharing adders with the M and D state calculation logic. Thiseliminates one row of the HMM matrix 30. Using dead cycles to begin HMMmatrix operations for the next HMM job in the queue.

Each grouping of ten rows of the HMM matrix 30 constitutes a “swath” 35in the HMM accelerator function. It is noted that the length of theswath may be increased or decreased so as to meet the efficiency and/orthroughput demands of the system. Hence, the swatch length may be aboutfive rows or less to about fifty rows or more, such as about ten rows toabout forty-five rows, for instance, about fifteen or about twenty rowsto about forty rows or about thirty five rows, including about twentyfive rows to about thirty rows of cells in length.

With the exceptions noted in the section, above, related to harvestingcycles that would otherwise be dead cycles at the right edge of thematrix of FIG. 16, the HMM matrix may be processed one swath at a time.As can be seen with respect to FIG. 16, the states of the cells in thebottom row of each swath 35 a feed the state computation logic in thetop row of the next swath 35 b. Consequently, there may be a need tostore (put) and retrieve (get) the state information for those cells inthe bottom row, or edge, of each swath.

The logic to do this may include one or more of the following: when theM, I, and D state computations for a cell in the HMM matrix 30 completefor a cell with mod(read index, 10)=9, save the result to the M, I, Dstate storage memory. When M and I state computations (e.g., where Dstate computations do not require information from cells above them inthe matrix) for a cell in the HMM matrix 30 begin for a cell withmod(read index, 10)=0, retrieve the previously saved M, I, and D stateinformation from the appropriate place in the M, I, D state storagememory. Note in these instances that M, I, and D state values that feedrow 0 (the top row) M and I state calculations in the HMM matrix 30 aresimply a predetermined constant value and do not need to be recalledfrom memory, as is true for the M and D state values that feed column 0(the left column) D state calculations.

As noted above, the HMM accelerator may or may not include a dedicatedsumming resource in the HMM hardware accelerator such that exist simplyfor the purpose of the final sum operations. However, in particularinstances, as described herein, an additional row may be added to thebottom of the HMM matrix 30, and the clock cycles associated with thisextra row may be used for final summing operations. For instance, thesum itself may be achieved by borrowing (e.g., as per FIG. 13) an adderfrom the M state computation logic to do the M+I operation, and furtherby borrowing an adder from the D state computation logic to add thenewly formed M+I sum to the running final sum accumulation value. Insuch an instance, the control logic to activate the final sum operationmay kick in whenever the read index that guides the HMM traversingoperation is equal to the length of the inputted read sequence for thejob. These operations can be seen at line 34 toward the bottom of thesample HMM matrix 30 of FIG. 16.

Hence, as can be seen above, in one implementation, the variant callermay make use of the mapper and/or aligner engines to determine thelikelihood as to where various reads originated, such as with respect toa given location. In such instances, the variant caller may beconfigured to detect the underlying sequence at that location, such asindependently of other regions not immediately adjacent to it. This isparticularly useful and works well when the region of interest does notresemble any other region of the genome over the span of a single read(or a pair of reads for paired-end sequencing). However, a significantfraction of the human genome does not meet this criterion, which canmake variant calling, e.g., the process of reconstructing a subject'sgenome from the reads that an NGS produces, challenging.

Particularly, though DNA sequencing has improved dramatically, variantcalling remains a difficult problem, largely due to the genome'sredundant structure. As disclosed herein, however, the complexitiespresented by the genome's redundancy may be overcome, at least in part,from a perspective driven by short read data. More particularly, thedevices, systems, and methods of employing the same as disclosed hereinmay be configured in such a manner so as to focus on Homologous orSimilar regions that may otherwise have been characterized by lowvariant calling accuracy. In certain instances, such low variant callingaccuracy may stem from difficulties observed in read mapping andalignments such as in homologous regions that typically may result invery low read MAPQs. Accordingly, presented herein are strategicimplementations that accurately call variants (SNPs, INDELs, and thelike) in homologous regions by jointly considering the informationpresent in the homologous regions.

For instance, many regions of the genome are homologous, e.g., they havenear-identical copies elsewhere, and as a result, the true sourcelocation of a read may be subject to considerable uncertainty.Specifically, if a group of reads is mapped with low confidence, atypical variant caller may ignore the reads, even though they containuseful information. In other instances, if a read is mismapped (e.g.,the primary alignment is not the true source of the read), it can resultin detection errors. More specifically, previously implementedshort-read sequencing technologies have been susceptible to theseproblems, and conventional detection methods often leave large regionsof the genome in the dark. In some instances, long-read sequencing canmitigate these problems, but it typically has much higher cost and/orhigher error rates, takes longer, and/or suffers from othershortcomings. Therefore, in various instances, instead of consideringeach region in isolation and/or instead of performing and analyzing lingread sequencing, multi-region joint detection (MRJD) methodologies maybe employed, such as where the MRJD considers multiple, e.g., all,locations from which a group of reads may have originated and attemptsto detect the underlying sequences together, e.g., jointly, using allavailable information, which may be regardless of confidence and/orcertainty scores.

For instance, for a diploid organism with statistically uniformcoverage, a brute force Bayesian calculation may be performed. However,in such a brute force MLRD computation, the complexity of thecalculation grows rapidly with the number of regions N and the number ofcandidate haplotypes K to be considered. Particularly, to consider allcombinations of candidate haplotypes, the number of candidate solutionsfor which to calculate probabilities is exponential. For example, asdescribed in greater detail below, in a brute force implementation, thenumber of candidate haplotypes includes the number of active positions,which if a graph-assembly technique is used to generate the list ofcandidate haplotypes, such as in a De Brujin graph disclosed herein,then the number of active positions is the number of independent“bubbles” in the graph. Hence, such a brute-force calculation can beprohibitively expensive to implement, and as such brute force Bayesiancalculations can be prohibitively complex.

Accordingly, in one aspect, as set forth in FIG. 17, a method to reducethe complexity of such brute force calculations is herein provided. Forinstance, as disclosed above, though the speed and accuracy of DNAsequencing has improved dramatically, especially with respect to themethods disclosed herein, variant calling, e.g., the process ofreconstructing a subject's genome from the reads a sequencer produces,remains a difficult problem, largely due to the genome's redundantstructure. The devices, systems, and methods disclosed herein thereforeare configured to reduce the complexities presented by the genome'sredundancy from a perspective driven by short read data in contrast tolong read sequencing. In particular, provided herein are methods forperforming very long read detection that accounts for homologous and/orsimilar regions of the genome that are usually characterized by lowvariant calling accuracy without necessarily having to perform long readsequencing.

Specifically, as can be seen with respect to FIG. 17, a high-levelprocessing chain is provided, such as where the processing chain mayinclude one or more of the following steps: Identifying and inputtinghomologous regions, performing pre-processing of the input homologousregions, performing a pruned very long read (VLRD) or multi region jointdetection (MJRD), and outputting a variant call file. Particularly withrespect to identifying homologous regions, a mapped, aligned, and/orsorted SAM and/or BAM file, e.g., a CRAM, may be used as the primaryinput to a multi-region joint detection processing engine implementingan MRJD algorithm, as described herein. The MJRD processing engine maybe part of an integrated circuit such as a CPU and/or GPU and/or Quantumcomputing platform, running software, e.g., a quantum algorithm, orimplemented within an FPGA, ASIC, or the like. For instance, the abovedisclosed mapper and/or aligner may be used to generate a CRAM file,e.g., with settings to output N secondary alignments for each read alongwith the primary alignments. These primary and secondary reads may thenbe used to identify a list of homologous regions, which homologousregions may be computed based on a user defined similarity thresholdbetween the N regions of the reference genome. This list of identifiedhomologous regions may then be fed to the pre-processing stage of asuitably configured MRJD module.

Accordingly, in the pre-processing stage, for every set of homologousregions, a joint-pileup may first be generated such as by using theprimary alignments from one or more, e.g., every, region in the set.See, for instance, FIG. 19. Using this joint pileup, a list ofactive/candidate variant positions (SNPS/INDELs) may then be generatedwhereby each of these candidate variants may be processed and evaluatedby the MRJD pre-processing engine(s). To reduce computation complexity,a connection matrix may be computed that may be used to define the orderof processing of the candidate variants.

In such implementations, the multi-region joint detection algorithmevaluates each identified candidate variant based on the processingorder defined in the generated connection matrix. Firstly, one or morecandidate joint diplotypes (G_(i)) may be generated and given acandidate variant. Next, the a-posteriori probabilities of each of thejoint diplotypes (P(G_(i)|R)) may be calculated. From these a-posterioriprobabilities a genotype matrix may be computed. Next, N diplotypes withthe lowest a-posteriori probabilities may be pruned so as to reduce thecomputational complexity of the calculations. Then the next candidatevariant that provides evidence for the current candidate variant beingevaluated may be included and the above process repeated. Havingincluded information such as from one or more, e.g., all, the candidatevariants from one or more, e.g., all, regions in the homologous regionset for the current variant, a variant call may be made from the finalgenotyping matrix. Each of the active positions, therefore, may all beevaluated in the manner above thereby resulting in a final VCF file.

Particularly, as can be seen with respect to FIG. 18, a MJRDpreprocessing step may be implemented, such as including one or more ofthe following steps or blocks: The identified and assembled jointpile-up is loaded, a candidate variant list is then created from theassembled joint pile up, and a connection matrix is computed.Particularly, in various instances, a preprocessing methodology may beperformed, such as prior to performing one or more variant calloperations, such as a multiple read joint detection operation. Suchoperations may include one or more preprocessing blocks, including:steps pertaining to the loading of joint pile-ups, generating a list ofvariant candidates from the joint pileups, and computing a connectionmatrix. Each of the blocks and potential steps associated therewith willnow be discussed in greater detail.

Specifically, a first joint pile up pre-processing block may be includedin the analysis procedure. For example, various reference regions for anidentified span may be extracted, such as from the mapped and/or alignedreads. Particularly, using the list of homologous regions, a jointpileup for each set of homologous regions may be generated. Next, auser-defined span may be used to extract the N reference regionscorresponding to N homologous regions within a set. Subsequently, one ormore, e.g., all, of the reference regions may be aligned, such as byusing a Smith-Waterman alignment, which may be used to generate auniversal coordinate system of all the bases in the N reference regions.Further, all the primary reads corresponding to each region may then beextracted from the input SAM or BAM file and be mapped to the universalcoordinates. This mapping may be done, as described herein, such as byusing the alignment information (CIGAR) present in a CRAM file for eachread. In the scenario where some reads pairs were not previously mapped,the reads may be mapped and/or aligned, e.g., Smith-Waterman aligned, toits respective reference region.

More particularly, once a joint pile up has been generated and loaded,see for instance, FIG. 19, a candidate variant list may be created, suchas from the joint pile up. For instance, a De Bruijn graph (DBG) orother assembly graph may be produced so as to extract various candidatevariants (SNPs/Indels) that may be identified from the joint pileup.Once the DBG is produced the various bubbles in the graph can be minedso as to derive a list of variant candidates.

Particularly, given all the reads, a graph may be generated using eachreference region as a backbone. All of the identified candidate variantpositions can then be aligned to universal coordinates. A connectionmatrix may then be computed, where the matrix defines the order ofprocessing of the active positions, which may be a function of the readlength and/or insert size. As referenced herein, FIG. 19 shows anexample of a joint pileup of two homologous regions in chromosome 1.Although this pileup is with reference to two homologous regions ofchromosome 1, this is for exemplary purposes only as the production ofthe pileup process may be used for any and all homologous regionsregardless of chromosome.

As can be seen with respect to FIG. 20, a candidate variant list may becreated as follows. First, a joint pileup may be formed and a De Bruijngraph (DBG) or other assembly graph may be constructed, in accordancewith the methods disclosed herein. The DBG may then be used to extractthe candidate variants from the joint pileups. The construction of theDBG is performed in such a manner as to generate bubbles, indicatingvariations, representing alternate pathways through the graph where eachalternate path is a candidate haplotypes. See, for instance, FIGS. 20and 21.

Accordingly, the various bubbles in the graph represent the list ofcandidate variant haplotype positions. Hence, given all of the reads,the DBG may be generated using each reference region as a backbone. Thenall of the candidate variant positions can be aligned to universalcoordinates. Specifically, FIG. 20 illustrates a flow chart settingforth the process of generating a DBG and using the same to producecandidate haplotypes. More specifically, the De Bruijn graph may beemployed in order to create the candidate variant list of SNPs andINDELs. Given that there are N regions that are being jointly processedby MRJD, N de-bruijn graphs may be constructed. In such an instance,every graph may use one reference region as a backbone and all of thereads corresponding to the N regions.

For instance, in one methodological implementation, after the DBG isconstructed, the candidate haplotypes may be extracted from the DeBruijn graph based on the candidate events. However, when employing anMRJD pre-processing protocol, as described herein, N regions may bejointly processed, such as where the length of the regions can be a fewthousand bases or more, and the number of haplotypes to be extracted cangrow exponentially very quickly. Accordingly, in order to reduce thecomputational complexity, instead of extracting entire haplotypes, onlythe bubbles need be extracted from the graphs that are representative ofthe candidate variants.

An example of bubble structures formed in a De Bruijn graph is shown inFIG. 21. A number of regions to be processed jointly are identified.This determines one of two processing pathways that may be followed. Ifjoint regions are identified all the reads may be used to form a DBG.Bubbles showing possible variants may be extracted so as to identify thevarious candidate haplotypes. Specifically, for each bubble a SWalignment may be performed on the alternate paths to the referencebackbone. From this the candidate variants may be extracted and theevents from each graph may be stored.

However, in other instances, once the first process has been performed,so as to generate one or more DBGs, and/or i is now equal to 0, then theunion of all candidate events from all of the DBGs may be generated,where any duplicates may be removed. In such an instance, all candidatevariants may be mapped, such as to a universal coordinate system, so asto produce the candidate list, and the candidate variant list may besent as an input to a pruning module, such as the MJRD module. Anexample of only performing bubble extraction, instead of extracting theentire haplotypes, is shown in FIG. 22. In this instance, it is only thebubble region showing possible variants that is extracted and processed,as described herein.

Specifically, once the representative bubbles have been extracted, theglobal alignment, e.g., Smith-Waterman alignment, of the bubble path andthe corresponding reference backbone may be performed to get thecandidate variant(s) and its position in the reference. This may be donefor all extracted bubbles in all of the De Bruijn graphs. Next, theunion of all the extracted candidate variants may be taken from the Ngraphs, the duplicate candidates, if any, may be removed, and the uniquecandidate variant positions may be mapped to the universal coordinatesystem obtained from the joint pile-up. This results in a final list ofcandidate variant positions for the N regions that may act as an inputto a “Pruned” MRJD algorithm.

In particular preprocessing blocks, as described herein above, aconnection matrix may be computed. For instance, a connection matrix maybe used to define the order of processing of active, e.g., candidate,positions, such as a function of read length and insert size. Forexample, to further reduce computational complexity, a connection matrixmay be computed so as to define the order of processing of identifiedcandidate variants that are obtained from the De Bruijn graph. Thismatrix may be constructed and employed in conjunction with or as asorting function to determine which candidate variants to process first.This connection matrix, therefore, may be a function of the mean readlength and the insert size of the paired-end reads. Accordingly, for agiven candidate variant, other candidate variant positions that are atintegral multiples of the insert size or within the read length havehigher weights compared to the candidate variants at other positions.This is because these candidate variants are more likely to provideevidence for the current variant being evaluated. An exemplary sortingfunction, as implemented herein, is shown in FIG. 23 for mean readlength of 101 and insert-size of 300.

With respect to a MJRD pruning function, exemplary steps of a prunedMRJD algorithm, as referenced above, is set forth in FIG. 24. Forinstance, the input to the MRJD platform and algorithm is the jointpileup of N regions, e.g., all the candidate variants (SNPs/INDELs), thea-priori probabilities based on a mutation model, and the connectionmatrix. Accordingly, the input into the pruned MRJD processing platformmay be the joint pile-up, the identified active positions, the generatedconnection matrix, and the a-posteriori probability model, and/or theresults thereof.

Next, each candidate variant in the list can be processed and othervariants can be successively added as evidence for a current candidatebeing processed using the connection matrix. Accordingly, given thecurrent candidate variant and any supporting candidates, candidate jointdiplotypes may be generated. For instance, a joint diplotype is a set of2N haplotypes, where N is the number of regions being jointly processed.The number of candidate joint diplotypes M is a function of the numberof regions being jointly processed, number of active/candidate variantsbeing considered, and the number of phases. An example for generatingjoint diplotypes is shown below.

For: P=1, Number of active/candidate variant positions being considered;N=2, Number of regions being jointly processed;M=2^(2.N.P)=2⁴=16 candidate joint-diplotypes

Hence, for a single candidate active position, given all the reads andboth the reference regions, let the two haplotypes be ‘A’ and ‘G’.

Unique haplotypes=‘A’ and ‘G’Candidate Diplotypes=‘AA’, ‘AG’, ‘GA’ and ‘GG’, (4 candidates for 1region).

${{Candidate}\mspace{14mu} {Joint}\mspace{14mu} {Diplotypes}} = \begin{matrix}{\;^{\prime}{AAAA}^{\prime},} & {\;^{\prime}{AAAG}^{\prime},} & {\;^{\prime}{AAGA}^{\prime},} & {\;^{\prime}{AAGG}^{\prime}} \\{\;^{\prime}{AGAA}^{\prime},} & {\;^{\prime}{AGAG}^{\prime},} & {\;^{\prime}{AGGA}^{\prime},} & {\;^{\prime}{AGGG}^{\prime}} \\{\;^{\prime}{GAAA}^{\prime},} & {\;^{\prime}{GAAG}^{\prime},} & {\;^{\prime}{GAGA}^{\prime},} & {\;^{\prime}{GAGG}^{\prime}} \\{\;^{\prime}{GGAA}^{\prime},} & {\;^{\prime}{GGAG}^{\prime},} & {\;^{\prime}{GGGA}^{\prime},} & {\;^{\prime}{GGGG}^{\prime}}\end{matrix}$

Accordingly, using the candidate joint diplotypes, the read likelihoodscan be calculated given a haplotype for each haplotype in everycandidate joint diplotype set. This may be done using a HMM algorithm,as described herein. However, in doing so the HMM algorithm may bemodified from its standard use case so as to allow for candidatevariants (SNPs/INDELs) in the haplotype, which have not yet beenprocessed, to be considered. Subsequently, the read likelihoods can becalculated given a joint diplotype (P(r_(i)|Gm)) using the results fromthe modified HMM. This may be done using the formula below.

For the case of 2-region joint detection:

G_(m)=[ϑ_(11,m), ϑ_(12,m), ϑ_(21,m), ϑ_(22,m)], wherein ϑ_(ij,m), i isthe region and j is the phase

${P\left( {{r_{i}\left. G_{m} \right)} =}\quad \right.}\frac{P\left( {{{ri}\left. {{\vartheta 11},m} \right)} + {P\left( {{{ri}\left. {{\vartheta 12},m} \right)} + {P\left( {{{ri}\left. {{\vartheta 21},m} \right)} + {P\left( {{ri}\left. {{\vartheta 22},m} \right)} \right.}} \right.}} \right.}} \right.}{4}$

P(R|G_(m))=Π_(i)P(ri|Gm). Given P(r_(i)|Gm), it is straightforward tocalculate P(R|Gm) for all the reads. Next, using Bayes' formula, thea-posteriori probability (P(G_(i)|R)) may be computed from P(R|G_(i))and the a-priori probabilities (P(G_(i))).

P(G _(i) |R)=P(R|G _(i))P(G _(i))/Σ_(k) P(R|Gk)P(Gk).

Further, an intermediate genotype matrix may be calculated for eachregion given the a-posteriori probabilities for all the candidate jointdiplotypes. For each event combination in the genotype matrix thea-posteriori probabilities of all joint diplotypes supporting that eventmay be summed up. At this point, the genotype matrix may be consideredas “intermediate” because not all the candidate variants supporting thecurrent candidate have been included. However, as seen earlier, thenumber of joint diplotype candidates grows exponentially with the numberof candidate variant positions and number of regions. This in-turnexponentially increases the computation required to calculate thea-posteriori probabilities. Therefore, in order to reduce thecomputational complexity, at this stage, the number of joint diplotypesbased on the a-posteriori probabilities may be pruned so that the numberof joint diplotypes to keep may be user defined and programmable.Finally, the final genotype matrix may be updated based on auser-defined confidence metric of variants which is computed using theintermediate genotype matrix. The various steps of these processes areset forth in the process flow diagram of FIG. 24.

The process above may be repeated until all the candidate variants areincluded as evidence for the current candidates being processed usingthe connection matrix. Once all of the candidates have been included,the processing of the current candidate is done. Other stopping criteriafor processing candidate variants are also possible. For example, theprocess may be stopped when the confidence has stopped increasing asmore candidates variants are added. This analysis, as exemplified inFIG. 24, may be restarted and repeated in the same manner for all othercandidate variants in the list thereby resulting in a final variant callfile at the output of MRJD. Accordingly, instead of considering eachregion in isolation, a Multi-Region Joint Detection protocol, asdescribed herein, may be employed so as to consider all locations fromwhich a group of reads may have originated as it attempts to detect theunderlying sequences jointly using all available information.

Accordingly, for Multi-Region Joint Detection, an exemplary MRJDprotocol may employ one or more of the following equations in accordancewith the methods disclosed herein. Specifically, instead of consideringeach region to be assessed in isolation, MRJD considers a plurality oflocations from which a group of reads may have been originated andattempts to detect the underlying sequences jointly, such as by using asmuch as, e.g., all, the available information that is useful. Forinstance, in one exemplary embodiment:

Let N be the number of regions to be jointly processed. And let H_(k) bea candidate haplotype, k=1 . . . K, each of which may include variousSNPs, insertions and/or deletions relative to a reference sequence. Eachhaplotype H_(k) represents a single region along a single strand (or“phase”, e.g., maternal or paternal), and they need not be contiguous(e.g., they may include gaps or “don't care” sequences).

Let G_(m) be a candidate solution for both phases Φ=1, 2 (for a diploidorganism) and all regions n=1 . . . N:

$G_{m} = \begin{bmatrix}{{Gm},1,{1\ldots}} & {{Gm},1,N} \\{{Gm},2,{1\ldots}} & {{Gm},2,N}\end{bmatrix}$

where each element G_(m,Φ,n) is a haplotype chosen from the set ofcandidates {H₁ . . . H_(k)}.

First, the probability of each read may be calculated for each candidatehaplotype P(r_(i)|H_(k)), for example, by using a Hidden Markov Model(HMM). In the case of datasets with paired reads, r_(i) indicates thepair {r_(i,1), r_(i,2)}, andP(r_(i)|H_(k))=P(r_(i,1)|H_(k))P(r_(i,2)|H_(k)). In the case of datasetswith linked reads (e.g., barcoded reads), r_(i) indicates the group ofreads {R_(i,1) . . . r_(i,NL)} that came from the same long molecule,and P(r_(i)|H_(k))=Π_(n=1) ^(NL) P(ri, n|Hk).

Next, for each candidate solution G_(m), m=1 . . . M, we calculate theconditional probability of each read

${P\left( r_{i} \middle| G_{m} \right)} = {\frac{1}{2N}{\sum\limits_{n = 1}^{N}\; {\sum\limits_{\Phi = 1}^{2}\; {P\left( {\left. {ri} \middle| {Gm} \right.,\Phi,n} \right)}}}}$

and conditional probability of the entire pileup R={r₁ . . . r_(NR)}:P(R|G_(m))=Π_(i=1) ^(NR) P(ri|Gm).

Next, the a-posteriori probability is calculated of each candidatesolution given the observed pileup: P(G_(m)|R)=P(R|Gm)P(Gm)/Σ_(i=1) ^(M)P(R|Gi)P(Gi) where P(G_(m)) indicates the a-priori probability of thecandidate solution, which is set forth in detail here below.

Finally, the relative probability of every candidate variant V_(j) iscalculated

${\frac{P\left( {Vj} \middle| R \right)}{P\left( {ref} \middle| R \right)} = {\Sigma_{{n|{Gm}} = {> {vj}}}{{P\left( {Gm} \middle| R \right)}/\Sigma_{{m|{Gm}} = {> {ref}}}}{P\left( {Gm} \middle| R \right)}}},$

such as where G_(m)→V_(j) indicates that G_(m) supports variant V_(j),and G_(m)→ref indicates that G_(m) supports the reference. In a VCFfile, this may be reported as a quality score on

${{QUAL}\left( V_{j} \right)} = {{- 10}\log_{10}{\frac{P\left( {Vj} \middle| R \right)}{P\left( {ref} \middle| R \right)}.}}$

An exemplary process for performing various variant calling operationsis set forth herein with respect to FIG. 25 where a conventional andMRJD detection process are compared. Specifically, FIG. 25 illustrates ajoint pileup of paired reads for two regions whose reference sequencesdiffer by only 3 bases over the range of interest. All the reads areknown to come from either region #1 or region #2, but it is not knownwith certainty from which region any individual read originated. Note,as described above, that the bases are only shown for the positionswhere the two references differ, e.g., bubble regions, or where thereads differ from the reference. These regions are referred to as theactive positions. All other positions can be ignored, as they don'taffect the calculation.

Accordingly, as can be seen with respect to FIG. 25, in a conventionaldetector, the read pairs 1-16 would be mapped to region #2, and thesealone would be used for variant calling in region #2. All of these readsmatch the reference for region #2, so no variants would be called.Likewise, read pairs 17-23 would be mapped to region #1, and these alonewould be used for variant calling in region #1. As can be seen, all ofthese reads match the reference for region #1, so no variants will becalled. However, read pairs 24-32 map equally well to region #1 andregion #2 (each has a one-base difference to ref #1 and to ref #2), sothe mapping is indeterminate, and a typical variant caller would simplyignore these reads. As such, a conventional variant caller would make novariant calls for either region, as seen in FIG. 25.

However, with MRJD, FIG. 25 illustrates that the result is completelydifferent than that received employing conventional methods. Therelevant calculations are set forth below. In this instance N=2 regions.Additionally, there are three positions, each with 2 candidate bases(one can safely ignore bases whose count is sufficiently low, and inthis example the count is zero on all but 2 bases in each position). Ifall combinations are considered, this will yield K=2³=8 candidatehaplotypes: H₁=CAT, H₂=CAA, H₃=CCT, H₄=CCA, H₅=GAT, H₆=GAA, H₇=GCT,H₈=GCA.

In a brute-force calculation where all combinations of all candidatehaplotypes are considered, the number of candidate solutions isM=K^(2N)=8^(2.2)=4096, and P(G_(m)/R) may be calculated for eachcandidate solution G_(m). The following illustrates this calculation fortwo candidate solutions:

${G_{m1} = \begin{bmatrix}{CAT} & {GCA} \\{CAT} & {GCA}\end{bmatrix}},{G_{m\; 2} = \begin{bmatrix}{CAT} & {GCA} \\{CCT} & {GCA}\end{bmatrix}}$

Where G_(m1) has no variants (this is the solution found by aconventional detector), and G_(m2) has a single heterozygous SNP A→C inposition #2 of region #1.

The probability P(r_(i)|H_(k)) depends on various factors including thebase quality and other parameters of the HMM. It may be assumed thatonly base call errors are present and all base call errors are equallylikely, so P(r_(i)|H_(k))=(1−p_(e))^(Np(i)−Ne(i))(p_(e)/3)^(Ne(i)),where p_(e) is the probability of a base call error, N_(p)(i) is thenumber of active base position(s) overlapped by read i, and Ne(i) is thenumber of errors for read i, assuming haplotype H_(k). Accordingly, itmay be assumed that p_(e)=0.01, which corresponds to a base quality ofphred 20. The table set forth in FIG. 26 shows P(r_(i)|H_(k)), for allread pairs and all candidate haplotypes. The two far right columns showP(r_(i)|G_(m1)) and P(r_(i)|G_(m2)), with the product at the bottom.FIG. 26 shows that P(R|G_(m1))=3.5⁻³⁰ and P(R|G_(m2))=2.2⁻¹⁵, adifference of 15 orders of magnitude in favor of G_(m2).

The a-posteriori probabilities P(G_(m)|R) depend on the a-prioriprobabilities P(G_(m)). To complete this example, a simple independentidentically distributed (IID) model may be assumed, such that thea-priori probability of a candidate solution with Nv variants is(1−p_(v))^(N.Np-Nv(p) _(v)/9)^(Nv), where N_(p) is the number of activepositions (3 in this case) and Pv is the probability of a variant,assumed to be 0.01 in this example. This yields P(G_(m))=7.22e-13, andP(G_(m2))=0.500. It is noted that G_(m2) is heterozygous over region #1,and all heterozygous pairs of haplotypes have a mirror-imagerepresentation with the same probability (obtained by simply swappingthe phases). In this case, the sum of the probabilities for G_(m2) andits mirror image sum to 1.000. Calculating probabilities of individualvariants, a heterozygous A→C SNP at position #2 of region #1, withquality score of phred 50.4 can be seen.

Accordingly, as can be seen, there is an immense computationalcomplexity for performing a brute force variant calling operation, whichcomplexity can be reduced by performing multiple region joint detection,as described herein. For instance, the complexity of the abovecalculations grows rapidly with the number of regions N and the numberof candidate haplotypes K. To consider all combinations of candidatehaplotypes, the number of candidate solutions for which to calculateprobabilities is M=K^(2N). In a brute force implementation, the numberof candidate haplotypes is K=2^(Np), where N_(p) is the number of activepositions (e.g., as exemplified above, if graph-assembly techniques areused to generate the list of candidate haplotypes, then Np is the numberof independent bubbles in the graph). Hence, a mere brute-forcecalculation can be prohibitively expensive to implement. For example, ifN=3 and Np=10, the number of candidate solutions isM=2^(3.2.10)=2⁶⁰=10¹⁸. However, in practice, it's not uncommon to havevalues of N_(p) much higher than this.

Consequently, because a brute force Bayesian calculation can beprohibitively complex, the following description sets forth furthermethods for reducing the complexity of such calculations. For instance,in a first step of another embodiment, starting with a small number ofpositions N_(p) ^(j) (or even a single position N_(p) ^(j)=1), theBayesian calculation may be performed over those positions. At the endof the calculation, the candidates whose probability falls below apredefined threshold may be eliminated, such as in a pruning of the treefunction, as described above. In such an instance, the threshold may beadaptive.

Next, in a second step, the number of positions N_(p) ^(j) may beincreased by a small number ΔN_(p) (such as one: N_(p) ^(j+1)=N_(p)^(j)ΔN_(p)), and the surviving candidates can be combined with one ormore, e.g., all, possible candidates at the new position(s), such as ina growing the tree function. These steps of (1) performing the Bayesiancalculation, (2) pruning the tree, and (3) growing the tree, may then berepeated, e.g., sequentially, until a stopping criteria is met. Thethreshold history may then be used to determine the confidence of theresult (e.g., the probability that the true solution was or was notfound). This process is illustrated in the flow chart set forth in FIG.27.

It is to be understood that there are a variety of possible variationsto this approach. For instance, as indicated, the pruning threshold maybe adaptive, such as based on the number of surviving candidates. Forinstance, a simple implementation may set the threshold to keep thenumber of candidates below a fixed number, while a more sophisticatedimplementation may set the threshold based on a cost-benefit analysis ofincluding additional candidates. Further, a simple stopping criteria maybe that a result has been found with a sufficient level of confidence,or that the confidence on the initial position has stopped increasing asmore positions are added. Further still, a more sophisticatedimplementation may perform some type of cost-benefit analysis ofcontinuing to add more positions. Additionally, as can be seen withrespect to FIG. 27, the order in which new positions are added maydepend on several criteria, such as the distance to the initialposition(s) or how highly connected these positions are to thealready-included positions (e.g., the amount of overlap with the pairedreads).

A useful feature of this algorithm is that the probability that the truesolution wasn't found can be quantified. For instance, a useful estimateis obtained by simply summing the probabilities of all pruned branchesat each step: P_(pruned)=P_(pruned)+Σ_(mεpruned set) P(G_(m) ^(j)|R).Such an estimate is useful for calculating the confidence of theresulting variant calls:

$\frac{P\left( {vj} \middle| R \right)}{P\left( {ref} \middle| R \right)} = {{\Sigma_{{m|{Gm}} = {> {vj}}}{P\left( {Gm} \middle| R \right)}} + {{{Ppruned}/\Sigma_{{m|{Gm}} = {> {ref}}}}{P\left( {Gm} \middle| R \right)}} + {{Ppruned}.}}$

Good confidence estimates are essential for producing good ReceiverOperating Characteristic (ROC) curves. This is a key advantage of thispruning method over other ad hoc complexity reductions.

Returning to the example pileup of FIG. 25, and starting from theleft-most position (position #1) and working toward the right one baseposition at a time, using a pruning threshold of phred 60 on eachiteration: Let {G_(m) ^(j), m=1 . . . M_(j)} represent the candidatesolutions on the j-th iteration. FIG. 28 shows the candidate solutionson the first iteration, representing all combinations of bases C and G,listed in order of decreasing probability. For any solution withequivalent mirror-image representations (obtained by swapping thephases), only a single representation is shown here. The probabilitiesfor all candidate solutions can be calculated, and those probabilitiesbeyond the pruning threshold (indicated by the solid line in the FIG.28) can be dropped. As can be seen with respect to FIG. 28, as a resultof the pruning methods disclosed herein, six candidates survive.

Next, as can be seen with respect to FIG. 29, the tree can be grown byfinding all combinations of the surviving candidates from iteration #1and candidate bases (C and A) in the position #2. A partial list of thenew candidates is shown in FIG. 29, again shown in order of decreasingprobability. Again, the probabilities can be calculated and compared tothe pruning threshold, and in this instance 5 candidates survive.

Finally, all combinations of the surviving candidates from iteration #2and the candidate bases in position #3 (A and T) can be determined. Thefinal candidates and their associated probabilities are shown in FIG.30. Accordingly, when calculating the probabilities of individualvariants, it is determined that a heterozygous A→C SNP at position #2 ofregion #1, with quality score of phred 50.4, which is the same resultfound in the brute-force calculation. In this example, pruning had nosignificant effect on the end result, but in general pruning may affectthe calculation, often resulting in a more confidence score.

There are many possible variations to the implementations of thisapproach, which may affect the performance and complexity of the system,and different variations may be appropriate for different scenarios. Forinstance, there can be variations in deciding which regions to include.For example, prior to running a Multi-Region Joint Detection, thevariant caller may be configured to determine whether a given activeregion should be processed individually or jointly with other regions,and if jointly, it may then determine which regions to include. In otherinstances, some implementations may rely on a list of secondaryalignments provided by the mapper so as to inform or otherwise make thisdecision. Other implementations may use a database of homologousregions, computed offline, such as based on a search of the referencegenome.

Accordingly, a useful step in such operations is in deciding whichpositions to include. For instance, it is to be noted that variousregions of interest may not be self-contained and/or isolated fromadjacent regions. Hence, information in the pileup can influence theprobability of bases separated by far more than the total read length(e.g., the paired read length or long molecule length). As such, it mustbe decided which positions to include in the MRJD calculation, and thenumber of positions is not unconstrained (even with pruning). Forexample, some implementations may process overlapping blocks ofpositions and update the results for a subset of the positions based onthe confidence levels at those positions, or the completeness of theevidence at those positions (e.g., positions near the middle of theblock typically have more complete evidence than those near the edge).

Another determining factor may be the order in which new positions maybe added. For instance, for pruned MRJD, the order of adding newpositions may affect performance. For example, some implementations mayadd new positions based on the distance to the already-includedpositions, or the degree of connectivity with these positions (e.g., thenumber of reads overlapping both positions). Additionally, there arealso many variations on how pruning may be performed. In the example setforth above, the pruning was based on a fixed probability threshold, butin general the pruning threshold may be adaptive or based on the numberof surviving candidates. For instance, a simple implementation may setthe threshold to keep the number of candidates below a fixed number,while a more sophisticated implementation may set the threshold based ona cost-benefit analysis of including additional candidates.

Various implementations may perform pruning based on probabilitiesP(R|G_(m)) instead of the a-priori probabilities P(G_(m)|R). This hasthe advantage of allowing the elimination of equivalent mirror-imagerepresentations across regions (in addition to phases). This advantageis at least partially offset by the disadvantage of not pruning outcandidates with very low a-priori probabilities, which in variousinstances may be beneficial. As such, a useful solution may depend onthe scenario. If pruning is done, such as based on the P(R|G_(m)), thenthe bayesian calculation would be performed once after the finaliteration.

Further in the example above, the process was stopped after processingall base positions in the pileup shown, but other stopping criteria arealso possible. For instance, if only a subset of the base positions(e.g. when processing overlapping blocks) is being solved for, theprocess may stop when the result for the subset has been found with asufficient level of confidence, or when the confidence has stoppedincreasing as more positions are added. A more sophisticatedimplementation, however, may perform some type of cost-benefit analysis,weighing the computational cost against the potential value of addingmore positions.

A-priori probabilities may also be useful. For instance, in the examplesabove, a simple IID model was used, but other models may also be used.For example, it is to be noted that clusters of variants are more commonthan would be predicted by an IID model. It is also to be noted thatvariants are more likely to occur at positions where the referencesdiffer. Therefore, incorporating such knowledge into the a-prioriprobabilities P(Gm) can improve the detection performance and yieldbetter ROC curves. Particularly, it is to be noted that the a-prioriprobabilities for homologous regions are not well-understood in thegenomics community, and this knowledge is still evolving. As such, someimplementations may update the a-priori models as better informationbecomes available. This may be done automatically as more results areproduced. Such updates may be based on other biological samples or otherregions of the genome for the same sample, which learnings can beapplied to the methods herein to further promote a more rapid andaccurate analysis.

Accordingly, in some instance, an iterative MJRD process may beimplemented. Specifically, the methodology described herein can beextended to allow message passing between related regions so as tofurther reduce the complexity and/or increase the detection performanceof the system. For instance, the output of the calculation at onelocation can be used as an input a-priori probability for thecalculation at a nearby location. Additionally, some implementations mayuse a combination of pruning and iterating to achieve the desiredperformance/complexity tradeoff.

Further, sample preparation may be implemented to optimize the MRJDprocess. For instance, for paired-end sequencing, it may be useful tohave a tight distribution on the insertion size when using conventionaldetection. However, in various instances, introducing variation in theinsertion size could significantly improve the performance for MRJD. Forexample, the sample may be prepared to intentionally introduce a bimodaldistribution, a multi-modal distribution, or bell-curve-likedistribution with a higher variance than would typically be implementedfor conventional detection.

FIG. 31 illustrates the ROC curves for MRJD and a conventional detectorfor human sample NA12878 over selected regions of the genome with asingle homologous copy, such that N=2, with varying degrees of referencesequence similarity. This dataset used paired-end sequencing with a readlength of 101 and a mean insertion size of approx. 400. As can be seenwith respect to FIG. 31, MRJD offers dramatically improved sensitivityand specificity over these regions than conventional detection methods.FIG. 32 illustrates the same results displayed as a function of thesequence similarity of the references, measured over a window of 1000bases (e.g. if the references differ by 10 bases out of 1000, then thesimilarity is 99.0 percent). For this dataset, it may be seen thatconventional detection starts to perform badly at a sequence similarity˜0.98, while MRJD performs quite well up to 0.995 and even beyond.

Additionally, in various instances, this methodology may be extended toallow message passing between related regions to further reduce thecomplexity and/or increase the detection performance. For instance, theoutput of the calculation at one location can be used as an inputa-priori probability for the calculation at a nearby location, and insome implementations may use a combination of pruning and iterating toachieve the desired performance/complexity tradeoff. In particularinstances, as indicated above, prior to running multi-region jointdetection, the variant caller may determine whether a given activeregion should be processed individually or jointly with other regions.Additionally, as indicated above, some implementations may rely on alist of secondary alignments provided by the mapper to make such adecision. Other implementations may use a database of homologousregions, computed offline based on a search of the reference genome.

In view of the above, a Pair-Determined Hidden Markov Model (PD-HMM maybe implemented in a manner so as to take advantage of the benefits ofMRJD. For instance, MRJD can separately estimate the probability ofobserving a portion or all of the reads given each possible jointdiplotype, which comprises one haplotype per ploidy per homologousreference region, e.g., for two homologous regions in diploidchromosomes, each joint diplotype will include four haplotypes. In suchinstances, all or a portion of the possible haplotypes may beconsidered, such as by being constructed, for instance, by modifyingeach reference region with every possible subset of all the variants forwhich there is nontrivial evidence. However, for long homologousreference regions, the number of possible variants is large, so thenumber of haplotypes (combinations of variants) becomes exponentiallylarge, and the number of joint diplotypes (combinations of haplotypes)may be astronomical.

Consequently, to keep MRJD calculations tractable, it may not be usefulto test all possible joint diplotypes. Rather, in some instances, thesystem may be configured in such a manner that only a small subset of“most likely” joint diplotypes is tested. These “most likely” jointdiplotypes may be determined by incrementally constructing a tree ofpartially-determined joint diplotypes. In such an instance, each node ofthe tree may be a partially determined joint diplotype that includes apartially determined haplotype per ploidy per homologous referenceregion. In this instance, a partially determined haplotype may include areference region modified by a partially determined subset of thepossible variants. Accordingly, a partially determined subset of thepossible variants may include an indication, for each possible variant,of one of three states: that the variant is determined and present, orthe variant is determined and absent, or the variant is not yetdetermined, e.g., it may be present or absent. At the root of the tree,all variants are undetermined in all haplotypes; tree nodes branchingsuccessively further from the root have successively more variantsdetermined as present or absent in each haplotype of each node's jointdiplotype.

Further, in the context of this joint diplotype tree, as describedabove, the amount of MRJD calculations is kept limited and tractable bytrimming branches of the tree in which all joint diplotype nodes areunlikely, e.g., moderately to extremely unlikely, relative to other morelikely branches or nodes. Accordingly, such trimming may be performed onbranches at nodes that are still only partially determined; e.g.,several or many variants are still not determined as present or absentfrom the haplotypes of a trimmed node's joint diplotype. Thus, in suchan instance, it is useful to be able to estimate or bound the likelihoodof observing each read assuming the truth of a partially determinedhaplotype. A modified pair hidden Markov model (pHMM) calculation,denoted “PD-HMM” for “partially determined pair hidden Markov model” isuseful to estimate the probability P(R|H) of observing read R assumingthe true haplotype H* is consistent with partially determined haplotypeH. Consistent in this context means that some specific true haplotype H*agrees with partially determined haplotype H with respect to allvariants whose presence or absence are determined in H, but for variantsundetermined in H, H* may agree with the reference sequence eithermodified or unmodified by each undetermined variant.

Note that it is not generally adequate to run an ordinary pHMMcalculation for some shorter sub-haplotype of H chosen to encompass onlydetermined variant positions. It is generally important to build thejoint diplotype tree with undetermined variants being resolved in anefficient order, which is generally quite different than their geometricorder, so that a partially determined haplotype H will typically havemany undetermined variant positions interleaved with determined ones. Toproperly consider PCR indel errors, it is useful to use a pHMM-likecalculation spanning through all determined variants and significantradius around them, which may not be compatible with attempts to avoidundetermined variant positions.

Accordingly, the inputs to PD-HMM may include the called nucleotidesequence of read R, the base quality scores (e.g., phred scale) of thecalled nucleotides of R, a baseline haplotype H0, and a list ofundetermined variants (edits) from H0. The undetermined variants mayinclude single-base substitutions (SNPs), multiple-base substitutions(MNPs), insertions, and deletions. Advantageously, it may be adequate tosupport undetermined SNPs and deletions. An undetermined MNP may beimperfectly but adequately represented as multiple independent SNPs. Anundetermined insertion may be represented by first editing the insertioninto the baseline haplotype, then indicating the correspondingundetermined deletion which would undo that insertion.

Restrictions may be placed on the undetermined deletions, to facilitatehardware engine implementation with limited state memory and logic, suchas that no two undetermined deletions may overlap (delete the samebaseline haplotype bases). If a partially determined haplotype must betested with undetermined variants violating such restrictions, this maybe resolved by converting one or more undetermined variants intodetermined variants in a larger number of PD-HMM operations, coveringcases with those variants present or absent. For example, if twoundetermined deletions A and B violate by overlapping each other inbaseline haplotype H0, then deletion B may be edited into H0 to yieldHOB, and two PD-HMM operations may be performed using undetermineddeletion A only, one for baseline haplotype H0, and the other forbaseline haplotype H0B, and the maximum probability output of the twoPD-HMM operations may be retained.

The result of a PD-HMM operation may be an estimate of the maximumP(R|H*) among all haplotypes H* that can be formed by editing H0 withany subset of the undetermined variants. The maximization may be donelocally, contributing to the pHMM-like dynamic programming in a givencell as if an adjacent undetermined variant is present or absent fromthe haplotype, whichever scores better, e.g., contributes the greaterpartial probability. Such local maximization during dynamic programmingmay result in larger estimates of the maximum P(R|H*) than truemaximization over individual pure H* haplotypes, but the difference isgenerally inconsequential.

Undetermined SNPs may be incorporated into PD-HMM by allowing one ormore matching nucleotide values to be specified for each haplotypeposition. For example, if base 30 of H0 is ‘C’ and an undetermined SNPreplaces this ‘C’ with a ‘T’, then the PD-HMM operation's haplotype mayindicate position 30 as matching both bases ‘C’ and ‘T’. In the usualpHMM dynamic programming, any transition to an ‘M’ state results inmultiplying the path probability by the probability of a correct basecall (if the haplotype position matches the read position) or by theprobability of a specific base call error (if the haplotype positionmismatches the read position); for PD-HMM this is modified by using thecorrect-call probability if the read position matches either possiblehaplotype base (e.g. ‘C’ or ‘T’), and the base-call-error probabilityotherwise.

Undetermined haplotype deletions may be incorporated into PD-HMM byflagging optionally-deleted haplotype positions, and modifying thedynamic programming of pHMM to allow alignment paths to skiphorizontally across undetermined deletion haplotype segments withoutprobability loss. This may be done in various manners, but with thecommon property that probability values in M, I, and/or D states cantransmit horizontally (along the haplotype axis) over the span of anundetermined deletion without being reduced by ordinary gap-open orgap-extend probabilities.

In one particular embodiment, haplotype positions where undetermineddeletions begin are flagged “F1”, and positions where undetermineddeletions end are flagged “F2”. In addition to the M, I, and D “states”(partial probability representations) for each cell of the HMM matrix(haplotype horizontal/read vertical), each PD-HMM cell may furtherinclude BM, BI, and BD “bypass” states. In F1-flagged haplotype columns,BM, BI, and BD states receive values copied from M, I, and D states ofthe cell to the left, respectively. In non-F2-flagged haplotype columns,particularly columns starting with an F1 flagged column end extendinginto the interior of an undetermined deletion, BM, BI, and BD statestransmit their values to BM, BI, and BD states of the cell to the right,respectively. In F2-flagged haplotype columns, in place of M, I, and Dstates used to calculate states of adjacent cells, the maximum of M andBM is used, and the maximum of I and BI is used, and the maximum of Dand BD is used, respectively. This is exemplified in an F2 column asmultiplexed selection of signals from M and BM, from I and BI, and fromD and BD registers.

Note that although BM, BI, and DB state registers may be represented inF1 through F2 columns, and maximizing M/BM, I/BI, and D/BD multiplexersmay be shown in an F2 column, these components may be present for allcell calculations, enabling an undetermined deletion to be handled inany position, and enabling multiple undetermined deletions withcorresponding F1 and F2 flags throughout the haplotype. Note also thatF1 and F2 flags may be in the same column, for the case of a single-baseundetermined deletion. It is further to be noted that the PD-HMM matrixof cells may be depicted as a schematic representation of the logical M,I, D, BM, BI, and BD state calculations, but in a hardwareimplementation, a smaller number of cell calculating logic elements maybe present, and pipelined appropriately to calculate M, D, I, BM, BI,and BD state values at high clock frequencies, and the matrix cells maybe calculated with various degrees of hardware parallelism, in variousorders consistent with the inherent logical dependencies of the PD-HMMcalculation.

Thus, in this embodiment, the pHMM state values in one column may beimmediately left of an undetermined deletion which may be captured andtransmitted rightward, unchanged, to the rightmost column of thisundetermined deletion, where they substitute into pHMM calculationswhenever they beat normal-path scores. Where these maxima are chosen,the “bypass” state values BM, BI, and BD represent the local dynamicprogramming results where the undetermined deletion is taken to bepresent, while “normal” state values M, I, and D represent the localdynamic programming results where the undetermined deletion is taken tobe absent.

In another embodiment, a single bypass state may be used, such as a BMstate receiving from an M state in F1 flagged columns, or receiving asum of M, D, and/or I states. In another embodiment, rather than using“bypass” states, gap-open and/or gap-extend penalties are eliminatedwithin columns of undetermined deletions. In another embodiment, bypassstates contribute additively to dynamic programming rightward ofundetermined deletions, rather than local maximization being used. In afurther embodiment, more or fewer or differently defined or differentlylocated haplotype position flags are used to trigger bypass or similarbehavior, such as a single flag indicating membership in an undetermineddeletion. In an additional embodiment, two or more overlappingundetermined deletions may participate, such as with the use ofadditional flags and/or bypass states. Additionally, undeterminedinsertions in the haplotype are supported, rather than, or in additionto, undetermined deletions. Likewise, undetermined insertions and/ordeletions on the read axis are supported, rather than or in addition toundetermined deletions and/or insertions on the haplotype axis. Inanother embodiment, undetermined multiple-nucleotide substitutions aresupported as atomic variants (all present or all absent). In a furtherembodiment, undetermined length-varying substitutions are supported asatomic variants. In another embodiment, undetermined variants arepenalized with fixed or configurable probability or score adjustments.

This PD-HMM calculation may be implemented as a hardware engine, such asin FPGA or ASIC technology, by extension of a hardware enginearchitecture for “ordinary” pHMM calculation or may be implemented byone or more quantum circuits in a quantum computing platform. Inaddition to an engine pipeline logic to calculate, transmit, and storeM, I, and D state values for various or successive cells, parallelpipeline logic can be constructed to calculate, transmit, and store BM,BI, and BD state values, as described herein and above. Memory resourcesand ports for storage and retrieval of M, I, and D state values can beaccompanied by similar or wider or deeper memory resources and ports forstorage and retrieval of BM, BI, and BD state values. Flags such as F1and F2 may be stored in memories along with associated haplotype bases.

Multiple matching nucleotides for e.g. undetermined SNP haplotypepositions may be encoded in any manner, such as using a vector of onebit per possible nucleotide value. Cell calculation dependencies in thepHMM matrix are unchanged in PD-HMM, so order and pipelining of multiplecell calculations can remain the same for PD-HMM. However, the latencyin time and/or clock cycles for complete cell calculation increasessomewhat for PD-HMM, due to the requirement to compare “normal” and“bypass” state values and select the larger ones. Accordingly, it may beadvantageous to include one or more extra pipeline stages for PD-HMMcell calculation, resulting in additional clock cycles of latency.Additionally, it may further be advantageous to widen each “swath” ofcells calculated by one or more rows, to keep the longer pipeline filledwithout dependency issues.

This PD-HMM calculation tracks twice as many state values (BM, BI, andBD, in addition to M, I, and D), as an ordinary pHMM calculation, andmay require about twice the hardware resources for an equivalentthroughput engine embodiment. However, a PD-HMM engine has exponentialspeed and efficiency advantages for increasing numbers of undeterminedvariants, versus an ordinary pHMM engine run once for each haplotyperepresenting a distinct combination of the undetermined variants beingpresent or absent. For example, if a partially determined haplotype has30 undetermined variants, each of which may be independently present orabsent, there are 2̂30, or more than 1 billion, distinct specifichaplotypes that pHMM would otherwise need to process.

Accordingly, in view of the above, for embodiments involvingFPGA-accelerated mapping, alignment, sorting, and/or variant callingapplications, one or more of these functions may be implemented in oneor both of software and hardware (HW) processing components, such assoftware running on a traditional CPU, and/or firmware such as may beembodied in an FPGA, ASIC, sASIC, and the like. In such instances, theCPU and FPGA need to be able to communicate so as to pass results fromone step on one device, e.g., the CPU or FPGA, to be processed in a nextstep on the other device. For instance, where a mapping function is run,the building of large data structures, such as an index of thereference, may be implemented by the CPU, where the running of a hashfunction with respect thereto may be implemented by the FPGA. In such aninstance, the CPU may build the data structure, store it in anassociated memory, such as a DRAM, which memory may then be accessed bythe processing engines running on the FPGA.

For instance, in some embodiments, communications between the CPU andthe FPGA may be implemented by any suitable interconnect such as aperipheral bus, such as a PCIe bus, USB, or a networking interface suchas Ethernet. However, a PCIe bus may be a comparatively looseintegration between the CPU and FPGA, whereby transmission latenciesbetween the two may be relatively high. Accordingly, although one devicee.g., (the CPU or FPGA) may access the memory attached to the otherdevice (e.g., by a DMA transfer), the memory region(s) accessed arenon-cacheable, because there is no facility to maintain cache coherencybetween the two devices. As a consequence, transmissions between the CPUand FPGA are constrained to occur between large, high-level processingsteps, and a large amount of input and output must be queued up betweenthe devices so they don't slow each other down waiting for high latencyoperations. This slows down the various processing operations disclosedherein. Furthermore, when the FPGA accesses non-cacheable CPU memory,the full load of such access is imposed on the CPU's external memoryinterfaces, which are bandwidth-limited compared to its internal cacheinterfaces.

Accordingly, because of such loose CPU/FPGA integrations, it isgenerally necessary to have “centralized” software control over the FPGAinterface. In such instances, the various software threads may beprocessing various data units, but when these threads generate work forthe FPGA engine to perform, the work must be aggregated in “central”buffers, such as either by a single aggregator software thread, or bymultiple threads locking aggregation access via semaphores, withtransmission of aggregated work via DMA packets managed by a centralsoftware module, such as a kernel-space driver. Hence, as results areproduced by the HW engines, the reverse process occurs, with a softwaredriver receiving DMA packets from the HW, and a de-aggregator threaddistributing results to the various waiting software worker threads.However, this centralized software control of communication with HW FPGAlogic is cumbersome and expensive in resource usage, reduces theefficiency of software threading and HW/software communication, limitsthe practical HW/software communication bandwidth, and dramaticallyincreases its latency.

Additionally, as can be seen with respect to FIG. 33A, a looseintegration between the CPU 1000 and FPGA 7 may require each device tohave its own dedicated external memory, such as DRAMs 1014, 14. Asdepicted in FIG. 33A, the CPU(s) 1000 has its own DRAM 1014 on thesystem motherboard, such as DDR3 or DDR4 DIMMs, while the FPGA 7 has itsown dedicated DRAMs 14, such as four 8 GB SODIMMs, that may be directlyconnected to the FPGA 7 via one or more DDR3 busses 6, such as a highlatency PCIe bus. Likewise, the CPU 1000 may be communicably coupled toits own DRAM 1014, such as by a suitably configured bus 1006. Asindicated above, the FPGA 7 may be configured to include one or moreprocessing engines 13, which processing engines may be configured forperforming one or more functions in a bioinformatics pipeline as hereindescribed, such as where the FPGA 7 includes a mapping engine 13 a, analignment engine 13 b, and a variant call engine 13 c. Other engines asdescribed herein may also be included. In various embodiments, one orboth of the CPU may be configured so as to include a cache 1014 a, 14 arespectively, that is capable of storing data, such as result data thatis transferred thereto by one or more of the various components of thesystem, such as one or more memories and/or processing engines.

Many of the operations disclosed herein, to be performed by the FPGA 7for genomic processing, require large memory accesses for theperformance of the underlying operations. Specifically, due to the largedata units involved, e.g. 3+ billion nucleotide reference genomes, 100+billion nucleotides of sequencer read data, etc., the FPGA 7 may need toaccess the host memory 1014 a large number of times such as foraccessing an index, such as a 30 GB hash table or other reference genomeindex, such as for the purpose of mapping the seeds from a sequencedDNA/RNA query to a 3 Gbp reference genome, and/or for fetching candidatesegments, e.g., from the reference genome, to align against.

Accordingly, in various implementations of the system herein disclosed,many rapid random memory accesses may need to occur by one or more ofthe hardwired processing engines 13, such as in the performance of amapping, aligning, and/or variant calling operation. However, it may beprohibitively impractical for the FPGA 7 to make so many small randomaccesses over the peripheral bus 3 or other networking link to thememory 1014 attached to the host CPU 1000. For instance, in suchinstances, latencies of return data can be very high, bus efficiency canbe very low, e.g., for such small random accesses, and the burden on theCPU external memory interface 1006 may be prohibitively great.

Additionally, as a result of each device needing its own dedicatedexternal memory, the typical form factor of the full CPU 1000+FPGA 7platform is forced to be larger than may be desirable, e.g., for someapplications. In such instances, in addition to a standard systemmotherboard for one or more CPUs 1000 and supporting chips 7 andmemories, 1014 and/or 14, room is needed on the board for a large FPGApackage (which may even need to be larger so as to have enough pins forseveral external memory busses) and several memory modules, 1014, 14.Standard motherboards, however, do not include these components, norwould they easily have room for them, so a practical embodiment may beconfigured to utilize an expansion card 2, containing the FPGA 7, itsmemory 14, and other supporting components, such as power supply, e.g.connected to the PCIe expansion slot on the CPU motherboard. To haveroom for the expansion card 2, the system may be fabricated to be in alarge enough chassis, such as a 1U or 2U or larger rack-mount server.

In view of the above, in various instances, as can be seen with respectto FIG. 33B, to overcome these factors, it may be desirable to configurethe CPU 1000 to be in a tight coupling arrangement with the FPGA 7.Particularly, in various instances, the FPGA 7 may be tightly coupled tothe CPU 1000, such as by a low latency interconnect 3, such as a quickpath interconnect (QPI). Specifically, to establish a tighter CPU+FPGAintegration, the two devices may be connected by any suitable lowlatency interface, such as a “processor interconnect” or similar, suchas INTELS® Quick Path Interconnect (QPI) or HyperTransport (HT).

Accordingly, as seen with respect to FIG. 33B, a system 1 is providedwherein the system includes both a CPU 1000 and a processor, such as anFPGA 7, wherein both devices are associated with one or more memorymodules. For instance, as depicted, the CPU 1000 may be coupled, such asvia a suitably configured bus 1006, to a DRAM 1014, and likewise, theFPGA 7 is communicably coupled to an associated memory 14 via a DDR3 bus6. However, in this instance, instead of being coupled to one anothersuch as by a typical high latency interconnect, e.g., PCIe interface,the CPU 1000 is coupled to the FPGA 7 by a low latency, hyper transportinterconnect 3, such as a QPI. In such an instance, due to the inherentlow latency nature of such interconnects, the associated memories 1014,14 of the CPU 1000 and the FPGA 7 are readily accessible to one another.Additionally, in various instances, due to this tight couplingconfiguration, one or more cashes 1114 a/14 a associated with thedevices may be configured so as to be coherent with respect to oneanother.

Some key properties of such a tightly coupled CPU/FPGA interconnectinclude a high bandwidth, e.g., 12.8 GB/s; low latency, e.g., 100-300ns; an adapted protocol designed for allowing efficient remote memoryaccesses, and efficient small memory transfers, e.g., on the order of 64bytes or less; and a supported protocol and CPU integration for cacheaccess and cache coherency. In such instances, a natural interconnectfor use for such tight integration with a given CPU 1000 may be itsnative CPU-to-CPU interconnect 1003, which may be employed herein toenable multiple cores and multiple CPUs to operate in parallel in ashared memory 1014 space, thereby allowing the accessing of each other'scache stacks and external memory in a cache-coherent manner.

Accordingly, as can be seen with respect to FIGS. 34A and 34B, a board 2may be provided, such as where the board may be configured to receiveone or more CPUs 1000, such as via a plurality of interconnects 1003,such as native CPU-CPU interconnects 1003 a and 1003 b. However, in thisinstance, as depicted in FIG. 34A, a CPU 1000 is configured so as to becoupled to the interconnect 1003 a, but rather than another CPU beingcoupled therewith via interconnect 1003 b, an FPGA 7 of the disclosureis configured so as to be coupled therewith. Additionally, the system 1is configured such that the CPU 1000 may be coupled to the associatedFPGA 7, such as by a low latency, tight coupling interconnect 3. In suchinstances, each memory 1014, 14 associated with the respective devices1000, 7 may be made so as to accessible to each other, such as in ahigh-bandwidth, cache coherent manner.

Likewise, as can be seen with respect to FIG. 34B, the system can alsobe configured so as to receive packages 1002 a and/or 1002 b, such aswhere each of the packages include one or more CPUs 1000 a, 1000 b thatare tightly coupled, e.g., via low latency interconnects 3 a and 3 b, toone or more FPGAs 7 a, 7 b, such as where given the system architecture,each package 2 a and 2 b may be coupled one with the other such as via atight coupling interconnect 3. Further, as can be seen with respect toFIG. 35, in various instances, a package 1002 a may be provided, whereinthe package 1002 a includes a CPU 1000 that has been fabricated in sucha manner so as to be closely coupled with an integrated circuit such asan FPGA 7. In such an instance, because of the close coupling of the CPU1000 and the FPGA 7, the system may be constructed such that they areable to directly share a cache 1014 a in a manner that is consistent,coherent, and readily accessible by either device, such as with respectto the data stored therein.

Hence, in such instances, the FPGA 7, and or package 2 a/2 b, can, ineffect, masquerade as another CPU, and thereby operate in acache-coherent shared-memory environment with one or more CPUs, just asmultiple CPUs would on a multi-socket motherboard 1002, or multiple CPUcores would within a multi-core CPU device. With such an FPGA/CPUinterconnect, the FPGA 7 can efficiently share CPU memory 1014, ratherthan having its own dedicated external memory 14, which may or may notbe included or accessed. Thus, in such a configuration, rapid, short,random accesses are supported efficiently by the interconnect 3, such aswith low latency. This makes it practical and efficient for the variousprocessing engines 13 in the FPGA 7 to access large data structures inCPU memory 1000.

For instance, as can be seen with respect to FIG. 37, a system forperforming a method is provided, such as where the method includes oneor more steps for performing a function of the disclosure, such as amapping function, as described herein, in a shared manner. Particularly,in one step (1) a data structure may be generated or otherwise provided,such as by a CPU 1000, which data structure may then be stored in anassociated memory (2), such as a DRAM 1014. The data structure may beany data structure, such as with respect to those described herein, butin this instance may be a reference genome or an index of the referencegenome, such as for the performance of a mapping and/or aligning orvariant calling function. In a second step (2), such as with respect toa mapping function, an FPGA 7 associated with the CPU 1000, such as by atight coupling interface 3, may access the CPU associated memory 1014,so as to perform one or more actions with respect to the referencegenome and/or an index thereof. Particularly, in a step (3) the FPGA 7may access the data structure so as to produce one or more seedsthereof, which seeds may be employed for the purposes of performing ahash function with respect thereto, such as to produce one or more readsthat have been mapped to one or more positions with respect to thereference genome.

In a further step (3), the mapped result data may be stored, e.g., ineither the host memory 1014 or in an associated DRAM 14. In such aninstance, the FPGA 7, more particularly, a processing engine 13 thereof,e.g., an alignment engine, may then access the stored mapped datastructure so as to perform an aligning function thereon, so as toproduce one or more reads that have been aligned to the referencegenome. In an additional step (4), the host CPU may then access themapped and/or aligned data so as to perform one or more functionsthereon, such as for the production of a De Brujin Graph, which DBG maythen be stored in its associated memory. Likewise, in one or moreadditional steps, the FPGA 7 may once again access the host CPU memory1014 so as to access the DBG and perform an HMM analysis thereon so asto produce one or more variant call files.

In particular instances, the CPU 1000 and/or FPGA 7 may have one or morememory cache's which due to the tight coupling of the interface betweenthe two devices will allow the separate caches to be coherent, such aswith respect to the transitionary data, e.g., results data, storedthereon, such as results from the performance of one or more functionsherein. In a manner such as this, data may be shared substantiallyseamlessly between the tightly coupled devices, thereby allowing apipeline of functions to be weaved together such as in a bioinformaticspipeline. Thus, in such an instance, it may no longer be necessary forthe FPGA 7 to have its own dedicated external memory 14 attached, andhence, due to such a tight coupling configuration, the reference genomeand/or reference genomic index, as herein described, may be intensivelyshared, e.g., in a cache coherent manner, such as for read mapping andalignment, and other genomic data processing operations.

Additionally, the low latency and cache coherency, as well as othercomponents discussed herein, allow smaller, lower-level operations to beperformed in one device (e.g., in a CPU or FPGA) before handing a dataunit or processing thread 20 back to the other device, such as forfurther processing. For example, rather than a CPU thread 20 a queuingup large amounts of work for the FPGA hardware logic 13 to perform, andthe same or another thread 20 b processing a large queue of results at asubstantially later time; a single CPU thread 20 might make a blocking“function call” to an FPGA hardware engine 13, resuming softwareexecution as soon as the hardware function completes. Hence, rather thanpackaging up data structures in packets to stream by DMA 14 into theFPGA 7, and unpacking results when they return, a software thread 20could simply provide a memory pointer to the FPGA engine 13, which couldaccess and modify the shared memory 1014/14 in place, in acache-coherent manner.

Particularly, given the relationship between the structures providedherein, the granularity of the software/hardware cooperation can be muchfiner, with much smaller, lower level operations being allocated so asto be performed by various hardware engines 13, such as function callsfrom various allocated software threads 20. For example, in a looseCPU/FPGA interconnect platform, for efficient acceleration of DNA/RNAread mapping, alignment, and/or variant calling, a fullmapping/aligning/variant calling pipeline may be constructed as one ormore FPGA engines, with unmapped and unaligned reads streamed fromsoftware to hardware, and the fully mapped and aligned reads streamedfrom the hardware back to the software, where the process may berepeated, such as for variant calling. With respect to theconfigurations herein described, this can be very fast, however, invarious instances, it may suffer from limitations of flexibility,complexity, and/or programmability, such because the whole map/alignand/or variant call pipeline is implemented in hardware circuitry, whichalthough reconfigurable in an FPGA, is generally much less flexible andprogrammable than software, and may therefore be limited to lessalgorithmic complexity.

By contrast, using a tight CPU/FPGA interconnect, such as a QPI or otherinterconnect in the configurations disclosed herein, several resourceexpensive discrete operations, such as seed generation and/or mapping,rescue scanning, gapless alignment, gapped, e.g., Smith-Waterman,alignment, etc., can be implemented as distinct separately accessiblehardware engines 13, e.g., see FIG. 38 and the overall mapping/alignmentand/or variant call algorithms can be implemented in software, withlow-level acceleration calls to the FPGA for the specific expensiveprocessing steps. This framework allows full software programmability,outside the specific acceleration calls, and enables greater algorithmiccomplexity and flexibility, than standard hardware implementedoperations.

Furthermore, in such a framework of software execution accelerated bydiscrete low-level FPGA hardware acceleration calls, hardwareacceleration functions may more easily be shared for multiple purposes.For instance, when hardware engines 13 form large, monolithic pipelines,the individual pipeline subcomponents may generally be specialized totheir environment, and interconnected only within one pipeline, whichunless tightly coupled may not generally be accessible for any purpose.But many genomic data processing operations, such as Smith-Watermanalignment, gapless alignment, De Bruijn or assembly graph constructionand other such operations, can be used in various higher level parentalgorithms. For example, as described herein, Smith-Waterman alignmentmay be used in DNA/RNA read mapping such as with respect to a referencegenome, but may also be configured so as to be used by haplotype-basedvariant callers, to align candidate haplotypes to a reference genome, orto each other, or to sequenced reads, such as in a HMM analysis. Hence,exposing various discrete low-level hardware acceleration functions viageneral software function calls may enable the same acceleration logic,e.g., 13, to be leveraged throughout a genomic data processingapplication, such as in the performance of both alignment and variantcalling, e.g. HMM, operations.

It is also practical, with tight CPU/FPGA interconnection, to havedistributed rather than centralized CPU 1000 software control overcommunication with the various FPGA hardware engines 13 describedherein. In widespread practices of multi-threaded, multi-core, andmulti-CPU software design, many software threads and processescommunicate and cooperate seamlessly, without any central softwaremodules, drivers, or threads to manage intercommunication. In such aformat, this is practical because of the cache-coherent shared memory,which is visible to all threads in all cores in all of the CPUs; whilephysically, coherent memory sharing between the cores and CPUs occurs byintercommunication over the processor interconnect, e.g., QPI or HT.

In a similar manner, as can be seen with respect to FIGS. 36 and 38 withthe tight CPU/FPGA interconnect disclosed herein, many threads 20 a, b,c, and processes running on one or multiple cores and/or CPUs 1000 a,100 b, and 1000 c can communicate and cooperate in a distributed mannerwith the various different FPGA hardware acceleration engines, such asby the use of cache-coherent memory sharing between the various CPU(s)and FPGA(s). For instance, as can be seen with respect to FIG. 36, amultiplicity of CPU cores 1000 a, 1000 b, and 1000 c can be coupledtogether in such a manner so as to share one or more memories, e.g.,DRAMs, and/or one or more caches having one or more layers or levelsassociated therewith. Likewise, with respect to FIG. 38, in anotherembodiment, a single CPU may be configured to include multiple cores1000 a, 1000 b, and 1000 c that can be coupled together in such a mannerso as to share one or more memories, e.g., DRAMs, and/or one or morecaches having one or more layers or levels associated therewith.

Hence, in either embodiment, data to be passed from one or more softwarethreads 20 from one or more CPU cores 1000 to a hardware engine 13 orvice versa may simply be updated in the shared memory 1014, or a cachethereof, visible to both devices. Even requests to process data inshared memory 1014, or notification of results updated in shared memory,can be signaled between the software and hardware, such as over a DDR4bus 1014, in queues implemented within the shared memory itself.Standard software mechanisms for control transfer and data protection,such as semaphores, mutexes, and atomic integers, can also beimplemented similarly for software/hardware coordination.

Consequently, in some embodiments, with no need for the FPGA 7 to haveits own dedicated memory 14 or other external resources, due to cachecoherent memory-sharing over a tight CPU/FPGA interconnect, it becomesmuch more practical to package the FPGA 7 more compactly and nativelywithin traditional CPU 1000 motherboards, without the use of expansioncards. See, for example FIGS. 34A and 34B and FIG. 35. Several packagingalternatives are available. Specifically, an FPGA 7 may be installedonto a multi-CPU motherboard in a CPU socket, as shown in FIGS. 34A and34B, such as by use of an appropriate interposer, such as a small PCboard 2, or alternative wire-bond packaging of an FPGA die within a CPUchip package 2 a, to route CPU socket pins to FPGA pins, including powerand ground, the processor interconnect 3 (QPI, HT, etc.), and systemconnections. Additionally, an FPGA die and CPU die may be included inthe same multi-chip package (MCP) with necessary connections, includingpower, ground, and CPU/FPGA interconnect, made within the package 2 a.Inter-die connections may be made by die-to-die wire-bonding, or byconnection to a common substrate or interposer, or by bonded pads orthrough-silicon vias between stacked dice.

Further, FPGA and CPU cores may be fabricated on a single die, see FIG.35, using system-on-a-chip (SOC) methodology. In any of these cases,custom logic, e.g., 17, may be instantiated inside the FPGA 7 tocommunicate over the CPU/FPGA interconnect 3 by its proper protocol, andto service and convert memory access requests from internal FPGA engines13 to the CPU/FPGA interconnect 3 protocols. Alternatively, some or allof this logic may be hardened into custom silicon, to avoid using upFPGA logic real estate for this purpose, such as where the hardenedlogic may reside on the CPU die, and/or the FPGA die, or a separate die.Also, in any of these cases, power supply and heat dissipationrequirements may be obeyed appropriately; such as within a singlepackage (MCP or SOC), the FPGA size and CPU core count may be chosen tostay within a safe power envelope, or dynamic methods (clock frequencymanagement, clock gating, core disabling, power islands, etc.) may beused to regulate power consumption according to changing the FPGA and/orthe CPU computation demands.

All of these packaging options share several advantages. Thetightly-integrated CPU/FPGA platform becomes compatible with standardmotherboards and/or system chassis, of a variety of sizes. If the FPGAis installed via an interposer (not shown) in a CPU socket, see FIGS.26A and 26B, then at least a dual-socket motherboard 1002 may beemployed, and e.g. a quad-socket motherboard may be required to allow 3CPUs+1 FPGA, 2 CPUs+2 FPGAs, or 1 CPU+3 FPGAs, etc. If each FPGA residesin the same chip package as a CPU (either MCP or SOC), see FIG. 34B,then even a single-socket motherboard is adequate, potentially in a verysmall chassis (although a dual socket motherboard is depicted); thisalso scales upward very well, e.g. 4 FPGAs and 4 multi-core CPUs on a4-socket server motherboard, which nevertheless could operate in acompact chassis, such as a 1U rack-mount server.

In various instances, therefore, there may be no need for an expansioncard to be installed so as to integrate the CPU and FPGA acceleration,because the FPGA 7 may be integrated in to the CPU 1000 socket. Thisimplementation avoids the extra space and power requirements of anexpansion card, as well as the additional failure point, expansion cardssometimes being relatively low-reliability components. Furthermore,standard CPU cooling solutions (head sinks, heat pipes, and/or fans),which are efficient yet low-cost since they are manufactured in highvolumes, can be applied to FPGAs or CPU/FPGA packages in CPU sockets,whereas cooling for expansion cards can be expensive and inefficient.

Likewise, an FPGA/interposer or CPU/FPGA package may include the fullpower supply of a CPU socket, e.g. 150 W, whereas a standard expansioncard may be power limited, e.g. 25 W or 75 W from the PCIe bus. Invarious instances, for genomic data processing applications, all thesepackaging options may facilitate easy installation of atightly-integrated CPU+FPGA compute platform, such as within a DNAsequencer. For instance, typical modern “next-generation” DNA sequencerscontain the sequencing apparatus (sample and reagent storage, fluidicstubing and control, sensor arrays, primary image and/or signalprocessing) within a chassis that also contains a standard or customserver motherboard, wired to the sequencing apparatus for sequencingcontrol and data acquisition. A tightly-integrated CPU+FPGA platform, asherein described, may be achieved in such a sequencer such as by simplyinstalling one or more FPGA/interposer or FPGA/CPU packages in CPUsockets of its existing motherboard, or alternatively by installing anew motherboard with both CPU(s) and FPGA(s).

Further, all of these packaging options may be configured to facilitateeasy deployment of the tightly-integrated CPU+FPGA platform such as intoa cloud or datacenter server rack, which require compact/dense servers,and very high reliability/availability. Hence, in accordance with theteachings herein, there are many processing stages for data from DNA (orRNA) sequencing to mapping and aligning to variant calling, which canvary depending on the primary and/or secondary and/or tertiaryprocessing technologies and the application. Such processing steps mayinclude one or more: signal processing on electrical measurements from asequencer, an image processing on optical measurements from thesequencer, base calling using processed signal or image data todetermine the most likely nucleotide sequence and confidence scores,filtering sequenced reads with low quality or polyclonal clusters,detecting and trimming adapters, key sequences, barcodes, and lowquality read ends, as well as De novo sequence assembly, generatingand/or utilizing De Bruijn graphs and/or sequence graphs, e.g., DeBruijn and sequence graph construction, editing, trimming, cleanup,repair, coloring, annotation, comparison, transformation, splitting,splicing, analysis, subgraph selection, traversal, iteration, recursion,searching, filtering, import, export, including mapping reads to areference genome, aligning reads to candidate mapping locations in thereference genome, local assembly of reads mapped to a reference region,sorting reads by aligned position, marking duplicate reads, includingPCR or optical duplicates, re-alignment of multiple overlapping readsfor indel consistency, base quality score recalibration, variant calling(single sample or joint), structural variant analysis, copy numbervariant analysis, somatic variant calling (e.g., tumor sample only,matched tumor/normal, or tumor/unmatched normal, etc.), RNA splicejunction detection, RNA alternative splicing analysis, RNA transcriptassembly, RNA transcript expression analysis, RNA differentialexpression analysis, RNA variant calling, DNA/RNA difference analysis,DNA methylation analysis and calling, variant quality scorerecalibration, variant filtering, variant annotation from known variantdatabases, sample contamination detection and estimation, phenotypeprediction, disease testing, treatment response prediction, customtreatment design, ancestry and mutation history analysis, population DNAanalysis, genetic marker identification, encoding genomic data intostandard formats (e.g. FASTA, FASTQ, SAM, BAM, VCF, BCF), decodinggenomic data from standard formats, querying, selecting or filteringgenomic data subsets, general compression and decompression for genomicfiles (gzip, BAM compression), specialized compression and decompressionfor genomic data (CRAM), genomic data encryption and decryption,statistics calculation, comparison, and presentation from genomic data,genomic result data comparison, accuracy analysis and reporting, genomicfile storage, archival, retrieval, backup, recovery, and transmission,as well as genomic database construction, querying, access management,data extraction, and the like.

All of these operations can be quite slow and expensive when implementedon traditional compute platforms. The sluggishness of such exclusivelysoftware implemented operations may be due in part to the complexity ofthe algorithms, but is typically due to the very large input and outputdatasets that results in high latency with respect to moving the data.The devices and systems disclosed herein overcome these problems, inpart due to the configuration of the various hardware processing enginesand/or in part due to the CPU/FPGA coupling configurations. Accordingly,as can be seen with respect to FIG. 39, one or more, e.g., all of theseoperations, may be accelerated by cooperation of CPUs 1000 and FPGAs 7,such as in a distributed processing model, as described herein. Forinstance, in some cases (encryption, general compression, read mapping,and/or alignment), a whole operational function may be substantially orentirely implemented in custom FPGA logic (such as by hardware designmethodology, e.g. RTL), such as where the CPU software mostly serves thefunction of compiling large data packets for preprocessing via workerthreads 20, such as aggregating the data into various jobs to beprocessed by one or more hardware implemented processing engines, andfeeding the various data inputs, such as in a first in first out format,to one or more of the FPGA engine(s) 13, and/or receives resultstherefrom.

For instance, as can be seen with respect to FIG. 39, in variousembodiments, a worker thread generates various packets of job data thatmay be compiled and/or streamed into larger job packets that may bequeued up and/or further aggregated in preparation for transfer, e.g.,via a DDR3 to the FPGA 7, such as over a high bandwidth, low latency,point to point interconnect protocol, e.g., QPI 3. In particularinstances, the data may be buffered in accordance with the particulardata sets being transferred to the FPGA. Once the packaged data isreceived by the FPGA 7, such as in a cache coherent manner, it may beprocessed and sent to one or more specialized clusters 11 whereby it mayfurther be directed to one or more sets of processing engines forprocessing thereby in accordance with one or more of the pipelineoperations herein described.

Once processed, results data may then be sent back to the cluster andqueued up for being sent back over the tight coupling point to pointinterconnect to the CPU for post processing. In certain embodiments, thedata may be sent to a de-aggregator thread prior to post processing.Once post processing has occurred, the data may be sent back to theinitial worker thread 20 that may be waiting on the data. Suchdistributed processing is particularly beneficial for the functionsherein disclosed above. Particularly, these functions aredistinguishable by the facts that their algorithmic complexity (althoughhaving a very high net computational burden) are pretty limited, andthey each may be configured so as to have a fairly uniform compute costacross their various sub-operations.

However, in various cases, rather than processing the data in largepackets, smaller sub-routines or discrete function protocols or elementsmay be performed, such as pertaining to one or more functions of apipeline, rather than performing the entire processing functions forthat pipeline on that data. Hence, a useful strategy may be to identifyone or more critical compute-intensive sub-functions in any givenoperation, and then implement that sub-function in custom FPGA logic(hardware acceleration), such as for the intensive sub-function(s),while implementing the balance of the operation, and ideally much ormost of the algorithmic complexity, in software to run on CPUs, asdescribed herein, such as with respect to FIG. 39.

Generally, it is typical of many genomic data processing operations thata small percentage of the algorithmic complexity accounts for a largepercentage of the overall computing load. For instance, as a typicalexample, 20% of the algorithmic complexity for the performance of agiven function may account for 90% of the compute load, while theremaining 80% of the algorithmic complexity may only account for 10% ofthe compute load. Hence, in various instances, the system componentsherein described may be configured so as to implement the high, e.g.,20% or more, complexity portion so as to be run very efficiently incustom FPGA logic, which may be a tractable and maintainable in ahardware design, and thus, may be configured for executing this in FPGA;which in turn may reduce the CPU compute load by 90%, thereby enabling10× overall acceleration. Other typical examples may be even moreextreme, such as where 10% of the algorithmic complexity may account for98% of the compute load, in which case applying FPGA acceleration, asherein described, to the 10% complexity portion be even easier, but mayalso enable up to 50× net acceleration.

However, such a “piecemeal” or distributed processing accelerationapproaches may be more practical when implemented in a tightlyintegrated CPU+FPGA platform, rather than on a loosely integratedCPU+FPGA platform. Particularly, in a loosely integrated platform, theportion, e.g., the functions, to be implemented in FPGA logic may beselected so as to minimize the size of the input data to the FPGAengine(s), and to minimize the output data from the FPGA engine(s), suchas for each data unit processed, and additionally may be configured soas to keep the software/hardware boundary tolerant of high latencies. Insuch instances, the boundary between the hardware and software portionsmay be forced, e.g., on the loosely-integrated platform, to be drawnthrough certain low-bandwidth/high-latency cut-points, which divisionsmay not otherwise be desirable when optimizing the partitioning of thealgorithmic complexity and computational loads. This may often resulteither in enlarging the boundaries of the hardware portion, encompassingan undesirably large portion of the algorithmic complexity in thehardwired format, or in shrinking the boundaries of the hardwareportion, undesirably excluding portions with dense compute load.

By contrast, on a tightly integrated CPU+FPGA platform, due to thecache-coherent shared memory and the high-bandwidth/low-latency CPU/FPGAinterconnect, the low-complexity/high-compute-load portions of a genomicdata processing operation can be selected very precisely forimplementation in custom FPGA logic (e.g., via the hardware engine(s)described herein), with optimized software/hardware boundaries. In suchan instance, even if a data unit is large at the desiredsoftware/hardware boundary, it can still be efficiently handed off to anFPGA hardware engine for processing, just by passing a pointer to theparticular data unit. Particularly, in such an instance, as per FIG.33B, the hardware engine 13 of the FPGA 7, may not need to access everyelement of the data unit stored within the DRAM 1014; rather, it canaccess the necessary elements, e.g., within the cache 1014 a, withefficient small accesses over the low-latency interconnect 3′ servicedby the CPU cache, thereby consuming less aggregate bandwidth than if theentire data unit had to be accessed and/or transferred to the FPGA 7,such as by DMA of the DRAM 1014, over a loose interconnect 3, as perFIG. 33A.

In such instances, the hardware engine 13 can annotate processingresults into the data unit in-place in CPU memory 1014, withoutstreaming an entire copy of the data unit by DMA to CPU memory. Even ifthe desired software/hardware boundary is not appropriate for a softwarethread 20 to make a high-latency, non-blocking queued handoff to thehardware engine 13, it can potentially make a blocking function call tothe hardware engine 13, sleeping for a short latency until the hardwareengine completes, the latency being dramatically reduced by thecache-coherent shared memory, the low-latency/high-bandwidthinterconnect, and the distributed software/hardware coordination model,as in FIG. 33B.

In particular instances, because the specific algorithms andrequirements of signal/image processing and base calling vary from onesequencer technology to another, and because the quantity of raw datafrom the sequencer's sensor is typically gargantuan (this being reducedto enormous after signal/image processing, and to merely huge after basecalling), such signal/image processing and base calling may beefficiently performed within the sequencer itself, or on a nearbycompute server connected by a high bandwidth transmission channel to thesequencer. However, DNA sequencers have been achieving increasingly highthroughputs, at a rate of increase exceeding Moore's Law, such thatexisting Central Processing Unit (“CPU”) and/or Graphics Processing Unit“GPU” based signal/image processing and base calling, when implementedindividually and alone, have become increasingly inadequate to the task.Nevertheless, since a tightly integrated CPU+FPGA and/or a GPU+FPGAand/or a GPU/CPU+FPGA platform can be configured to be compact andeasily instantiated within such a sequencer, e.g., as CPU and/or GPUand/or FPGA chip positioned on the sequencer's motherboard, or easilyinstalled in a server adjacent to the sequencer, or a cloud-based serversystem accessible remotely from the sequencer, such a sequencer may bean ideal platform to offer the massive compute acceleration offered bythe custom FPGA/ASIC hardware engines described herein.

For instance, a system may be provided so as to perform primary,secondary, and/or tertiary processing, or portions thereof, as hereindescribed, so as to be implemented by a CPU, GPU, and/or FPGA; aCPU+FPGA; a GPU+FPGA; and/or a GPU/CPU+FPGA platform. Further, suchaccelerated platforms, e.g., including one or more FPGA hardwareengines, are useful for implementation in cloud-based systems, asdescribed herein. For example, signal/image processing, base calling,mapping, aligning, sorting, and/or variant calling algorithms, orportions thereof, generally require large amounts of floating pointand/or fixed-point math, notably additions and multiplications. Thesefunctions can also be configured so as to be performed by one or morequantum processing circuits such as to be implemented in a quantumprocessing platform.

Particularly, large modern FPGAs/quantum circuits contain thousands ofhigh-speed multiplication and addition resources, and custom enginesimplemented on or by them can perform parallel arithmetic operations atrates far exceeding the capabilities of simple general CPUs. Likewise,simple GPUs, have more comparable parallel arithmetic resources, butthey often have awkward architectural limitations and programmingrestrictions that may prevent them from being fully utilized; whereasFPGA arithmetic resources, as implemented herein, can be wired up orotherwise configured by design to operate in exactly the designed mannerwith near 100% efficiency, such as for performing the calculationsnecessary to perform the functions herein. Accordingly, GPU cards may beadded to expansion slots on a motherboard with a tightly integrated CPUand/or FPGA, thereby allowing all three processor types to cooperate,although the GPU may still cooperate with all of its own limitations andthe limitations of loose integration.

More particularly, in various instances, with respect to GraphicsProcessing Units (GPUs), a GPU can be configured so as to implement oneor more of the functions, as herein described, so as to accelerate theprocessing speed of the underlying calculations necessary for preformingthat function, in whole or in part. More particularly, a GPU may beconfigured to perform one or more tasks in a mapping, aligning, sorting,and/or variant calling protocol, such as to accelerate one or more ofthe computations, e.g., the large amounts of floating point and/orfixed-point math, such as additions and multiplications involvedtherein, so as to work in conjunction with a server's CPU and/or FPGA toaccelerate the application and processing performance and shorten thecomputational cycles required for performing such functions. Cloudservers, as herein described, with GPU/CPU/FPGA cards may be configuredso as to easily handle compute-intensive tasks and deliver a smootheruser experience when leveraged for virtualization.

Accordingly, if a tightly integrated CPU+FPGA or GPU+FPGA and/orCPU/GPU/FPGA with shared memory platform is employed within a sequenceror attached server for signal/image processing, base calling, mapping,aligning, sorting, and/or variant calling functions, there may be anadvantage achieved such as in an incremental development process. Forinstance, initially, a limited portion of the compute load, such as adynamic programming function for base calling, mapping, aligning, and/orvariant calling may be implemented in one or more FPGA engines, where asother work may be done in the CPU and/or GPU expansion cards. However,the tight CPU/GPU/FPGA integration and shared memory model may befurther configured, later, so as to make it easy to incrementally selectadditional compute-intensive functions for GPU and/or FPGA acceleration,which may then be implemented as FPGA hardware engines, and various oftheir functions may be offloaded for execution into the FPGA(s) therebyaccelerating signal/image/base calling/mapping/aligning/variantprocessing. Such incremental advances can be implemented as needed tokeep up with the increasing throughput of various primary and/orsecondary and/or tertiary processing technologies.

Hence, read mapping and alignment, e.g., of one or more reads to areference genome, as well as sorting and/or variant calling may bebenefited from such FPGA and/or GPU acceleration. Specifically, mappingand alignment and/or variant calling, or portions thereof, may beimplemented partially or entirely as custom FPGA logic, such as with the“to be aligned” reads streaming from the CPU/GPU memory into the FPGAmap/align engines, and mapped and/or aligned read records streaming backout, which may further be streamed back on-board, such as in theperformance of sorting and/or variant calling. This type of FPGAacceleration works on a loosely-integrated CPU/GPU+FPGA platform, and inthe configurations described herein may be extremely fast. Nevertheless,there are some additional advantages that may be gained by moving to atightly-integrated CPU+FPGA platform.

Accordingly, with respect to mapping and aligning and variant calling,in some embodiments, a shared advantage of a tightly-integratedCPU/GPU+FPGA and/or quantum processing platform, as described herein, isthat the map/align/variant calling acceleration, e.g., hardwareacceleration, can be efficiently split into several discretecompute-intensive operations, such as seed generation and/or mapping,seed chain formation, paired end rescue scans, gapless alignment, andgapped alignment (Smith-Waterman or Needleman-Wunsch), De Bruijn graphformation, performing a HMM computation, and the like, such as where theCPU and/or GPU and/or quantum computing software performs lighter (butnot necessarily less complex) tasks, and may make acceleration calls todiscrete hardware and/or other quantum computing engines as needed. Sucha model may be less efficient in a typical loosely-integratedCPU/GPU+FPGA platform, e.g., due to large amounts of data to transferback and forth between steps and high latencies, but may be moreefficient in a tightly-integrated CPU+FPGA, GPU+FPGA, and/or quantumcomputing platform with cache-coherent shared memory,high-bandwidth/low-latency interconnect, and distributedsoftware/hardware coordination model. Additionally, such as with respectto variant calling, both Hidden Markov model (HMM) and/or dynamicprogramming (DP) algorithms, including Viterbi and forward algorithms,may be implemented in association with a base calling/mapping/aligningoperation, such as to compute the most likely original sequenceexplaining the observed sensor measurements, in a configuration so as tobe well suited to the parallel cellular layout of FPGAs and quantumcircuits described herein.

Specifically, an efficient utilization of hardware and/or softwareresources in a distributed processing configuration can result fromreducing hardware and/or quantum computing acceleration to discretecompute-intensive functions. In such instances, several of the functionsdisclosed herein may be performed in a monolithic pure-hardware engineso as to be less compute intensive, but may nevertheless still bealgorithmically complex, and therefore may consume large quantities ofphysical FPGA resources (lookup-tables, flip-flops, block-RAMs, etc.).In such instances, moving a portion or all of various discrete functionsto software could take up available CPU cycles, in return forrelinquishing substantial amounts of FPGA area. In certain of theseinstances, the freed FPGA area can be used for establishing greaterparallelism for the compute intensive map/align/variant callsub-functions, thus increasing acceleration, or for other genomicacceleration functions. Such benefits may also be achieved byimplementing compute intensive functions in one or more dedicatedquantum circuits for implementation by a quantum computing platform.

Hence, in various embodiments, the algorithmic complexity of the one ormore functions disclosed herein may be somewhat lessened by beingconfigured in a pure hardware or pure quantum computing implementation.However, some operations, such as comparing pairs of candidatealignments for paired-end reads, and/or performing subtle mappingquality (MAPQ) estimations, represent very low compute loads, and thuscould benefit from more complex and accurate processing in CPU/GPUand/or quantum computing software. Hence, in general, reducing thehardware processing to specific compute-intensive operations would allowmore complex and accurate algorithms to be employed in the CPU/GPUportions.

Furthermore, the whole map/align/variant call operation could beconfigured so as to employ more algorithmic complexity at high levels,such as by calling compute-intensive hardware functions in a dynamicorder or iteratively, whereas a monolithic pure-hardware/quantumprocessing design may be implemented in a manner so as to function moreefficiently as a linear pipeline. For example, if during processing oneSmith-Waterman alignment displayed evidence of the true alignment pathescaping the scoring band, e.g., swath as described above, anotherSmith-Waterman alignment could be called to correct this. Hence, theseconfigurations could essentially reduce the FPGA hardware/quantumacceleration to discrete functions, such as a form of proceduralabstraction, which would allow higher level complexity to be builteasily on top of it.

Additionally, in various instances, flexibility within themap/align/variant calling algorithms and features thereof may beimproved by reducing hardware and/or quantum acceleration to discretecompute-intensive functions, and configuring the system so as to performother, e.g., less intensive parts, in the software of the CPU and/orGPU. For instance, although hardware algorithms can be modified andreconfigured in FPGAs, generally such changes to the hardware designs,e.g., via firmware, may require several times as much design effort assimilar changes to software code. In such instances, thecompute-intensive portions of mapping and alignment and/or variantcalling, such as seed mapping, seed chain formation, paired end rescuescans, gapless alignment, gapped alignment, and HMM, which arerelatively well-defined, are thus stable functions and do not requirefrequent algorithmic changes. These functions, therefore, may besuitably optimized in hardware, whereas other functions, which could beexecuted by CPU/GPU software, are more appropriate for incrementalimprovement of algorithms, which is significantly easier in software.However, once fully developed could be implemented in hardware. One ormore of these functions may also be configured so as to be implementedin one or more quantum circuits of a quantum processing machine.

Accordingly, in various instances, variant calling (with respect to DNAor RNA, single sample or joint, germ line or somatic, etc.) may alsobenefit from FPGA and/or quantum acceleration, such as with respect toits various compute intensive functions. For instance, haplotype-basedcallers, which call bases on evidence derived from a context providedwithin a window around a potential variant, as described above, is oftenthe most compute-intensive operation. These operations include comparinga candidate haplotype (e.g., a single-strand nucleotide sequencerepresenting a theory of the true sequence of at least one of thesampled strands at the genome locus in question) to each sequencer read,such as to estimate a conditional probability of observing the readgiven the truth of the haplotype.

Such an operation may be performed via one or more of an MRJD, PairHidden Markov Model (pair-HMM), and/or a Pair-Determined Hidden MarkovModel (PD-HMM) calculation that sums the probabilities of possiblecombinations of errors in sequencing or sample preparation (PCR, etc.)by a dynamic programming algorithm. Hence, with respect thereto, thesystem can be configured such that a pair-HMM or PD-HMM calculation maybe accelerated by one or more, e.g., parallel, FPGA hardware or quantumprocessing engines, whereas the CPU/GPU/QPU software may be configuredso as to execute the remainder of the parent haplotype-based variantcalling algorithm, either in a loosely-integrated or tightly-integratedCPU+FPGA, or GPU+FPGA or CPU and/or GPU+FPGA and/or QPU platform. Forinstance, in a loose integration, software threads may construct andprepare a De Bruijn and/or assembly graph from the reads overlapping achosen active region (a window or contiguous subset of the referencegenome), extract candidate haplotypes from the graph, and queue uphaplotype-read pairs for DMA transfer to FPGA hardware engines, such asfor pair-HMM or PD-HMM comparison. The same or other software threadscan then receive the pair-HMM results queued and DMA-transferred backfrom the FPGA into the CPU/GPU memory, and perform genotyping andBayesian probability calculations to make final variant calls. Ofcourse, one or more of these functions can be configured so as to be runon one or more quantum computing platforms.

For instance, as can be seen with respect to FIG. 38, the CPU/GPU 1000may include one or more, e.g., a plurality, of threads 20 a, 20 b, and20 c, which may each have access to an associated DRAM 1014, which DRAMhas work space 1014 a, 1014 b, and 1014 c, within which each thread 20a, 20 b, and 20 c, may have access, respectively, so as to perform oneor more operations on one or more data structures, such as large datastructures. These memory portions and their data structures may beaccessed, such as via respective cache portions 1014 a′, such as by oneor more processing engines 13 a, 13 b, 13 c of the FPGA 7, whichprocessing engines may access the referenced data structures such as inthe performance of one or more of the operations herein described, suchas for mapping, aligning, sorting, and/or variant calling. Because ofthe high bandwidth, tight coupling interconnect 3, data pertaining tothe data structures and/or related to the processing results may beshared substantially seamlessly between the CPU and/or GPU and/or QPUand/or the associated FPGA, such as in a cache coherent manner, so as tooptimize processing efficiency.

Accordingly, in one aspect, as herein disclosed, a system may beprovided wherein the system is configured for sharing memory resourcesamongst its component parts, such as in relation to performing somecomputational tasks or sub-functions via software, such as run by a CPUand/or GPU and/or QPU, and performing other computational tasks or subfunctions via firmware, such as via the hardware of an associated chip,such as an FPGA and/or ASIC or structured ASIC. This may be achieved ina number of different ways, such as by a direct loose or tight couplingbetween the CPU/GPU/QPU and the chip, e.g., FPGA. Such configurationsmay be particularly useful when distributing operations related to theprocessing of large data structures, as herein described, that havelarge functions or subfunctions to be used and accessed by both the CPUand/or GPU and/or QPU and the integrated circuit. Particularly, invarious embodiments, when processing data through a genomics pipeline,as herein described, such as to accelerate overall processing function,timing, and efficiency, a number of different operations may be run onthe data, which operations may involve both software and hardwareprocessing components.

Consequently, data may need to be shared and/or otherwise communicated,between the software component running on the CPU and/or GPU and/or theQPU and the hardware component embodied in the chip, e.g., an FPGA orASIC. Accordingly, one or more of the various steps in the processingpipeline, or a portion thereof, may be performed by one device, e.g.,the CPU/GPU/QPU, and one or more of the various steps may be performedby the other device, e.g., the FPGA or ASIC. In such an instance, theCPU and the FPGA need to be communicably coupled, such as by a point topoint interconnect, in such a manner to allow the efficient transmissionof such data, which coupling may involve the shared use of memoryresources. To achieve such distribution of tasks and the sharing ofinformation for the performance of such tasks, the CPU and/or GPU and/orQPU may be loosely or tightly coupled to the FPGA, or other chip set.

Hence, in particular embodiments, a genomics analysis platform isprovided. For instance, the platform may include a motherboard, amemory, and plurality of integrated circuits, such as forming one ormore of a CPU/GPU/QPU, a mapping module, an alignment module, a sortingmodule, and/or a variant call module. Specifically, in particularembodiments, the platform may include a first integrated circuit, suchas an integrated circuit forming a central processing unit (CPU) and/ora graphics processing unit (GPU) that is responsive to one or moresoftware algorithms that are configured to instruct the CPU/GPU toperform one or more sets of genomics analysis functions, as describedherein, such as where the CPU/GPU includes a first set of physicalelectronic interconnects to connect with the motherboard. In particularembodiments, a quantum processing unit is provided, wherein the QPUincludes one or more quantum circuits that are configured for performingone or more of the functions disclosed herein. In various instances, thememory may also be attached to the motherboard and may further beelectronically connected with the CPU and/or GPU and/or QPU, such as viaat least a portion of the first set of physical electronicinterconnects. In such instances, the memory may be configured forstoring a plurality of reads of genomic data, and/or at least one ormore genetic reference sequences, and/or an index, e.g., such as a hashtable, of the one or more genetic reference sequences.

Additionally, the platform may include one or more of a secondintegrated circuit(s), such as where each second integrated circuitforms a field programmable gate array (FPGA) having a second set ofphysical electronic interconnects to connect with the CPU and thememory, such as via a point-to-point interconnect protocol. In such aninstance, the FPGA may be programmable by firmware to configure a set ofhardwired digital logic circuits that are interconnected by a pluralityof physical interconnects to perform a second set of genomics analysisfunctions, e.g., mapping, aligning, sorting, variant calling, e.g., anHMM function, etc. Particularly, the hardwired digital logic circuits ofthe FPGA may be arranged as a set of processing engines to perform oneor more pre-configured steps in a sequence analysis pipeline of thegenomics analysis, such as where the set(s) of processing enginesinclude one or more of a mapping and/or aligning and/or sorting and/orvariant call module, which modules may be formed of the separate or thesame subsets of processing engines.

For instance, with respect to variant calling, a pair-HMM or PD-HMMcalculation is one of the most compute-intensive steps of ahaplotype-based variant calling. Hence, variant calling speed may begreatly improved by accelerating this step in one or more FPGA orquantum processing engines, as herein described. However, there may beadditional benefit in accelerating other compute-intensive steps inadditional FPGA and/or QP engines, to achieve a greater speed-up ofvariant calling, or a portion thereof, or reduce CPU/GPU load and thenumber of CPU/GPU cores required, or both, as seen with respect to FIG.38.

Additional compute-intensive functions, with respect to variant calling,that may be implemented in FPGA and/or quantum processing enginesinclude: callable-region detection, where reference genome regionscovered by adequate depth and/or quality of aligned reads are selectedfor processing; active-region detection, where reference genome lociwith nontrivial evidence of possible variants are identified, andwindows of sufficient context around these loci are selected as activeregions for further processing; De-Bruijn or other assembly graphconstruction, where reads overlapping an active region and/or K-mersfrom those reads are assembled into a graph; assembly graph preparation,such as trimming low-coverage or low-quality paths, repairing danglinghead and tail paths by joining them onto a reference backbone in thegraph, transformation from K-mer to sequence representation of thegraph, merging similar branches and otherwise simplifying the graph;extracting candidate haplotypes from the assembly graph; as well asaligning candidate haplotypes to the reference genome, such as bySmith-Waterman alignment, e.g., to determine variants (SNPs and/orindels) from the reference represented by each haplotype, andsynchronize its nucleotide positions with the reference.

All of these functions may be implemented as high-performance hardwareengines within the FPGA, and/or by one or more quantum circuits of aquantum computing platform. However, calling such a variety of hardwareacceleration functions from many integration points in the variantcalling software may become inefficient on a loosely-coupledCPU/GPU/QPU+FPGA platform, and therefore a tightly-integratedCPU/GPU/QPU+FPGA platform may be desirable. For instance, variousstepwise processing methods such as: constructing, preparing, andextracting haplotypes from a De Bruijn, or other assembly graph, couldstrongly benefit from a tightly-integrated CPU/GPU/QPU+FPGA platform.Additionally, assembly graphs are large and complex data structures, andpassing them repeatedly between the CPU and/or GPU and the FPGA couldbecome resource expensive and inhibit significant acceleration.

Hence, an ideal model for such graph processing, employing atightly-integrated CPU/GPU/QPU and/or FPGA platform, is to retain suchgraphs in cache-coherent shared memory for alternating processing by CPUand/or GPU and/or QPU software and FPGA hardware functions. In such aninstance, a software thread processing a given graph may iterativelycommand various compute-intensive graph processing steps by a hardwareengine, and then the software could inspect the results and determinethe next steps between the hardware calls. This processing model, may beconfigured to correspond to software paradigms such as a data-structureAPI or an object-oriented method interface, but with compute intensivefunctions being accelerated by custom hardware engines, which is madepractical by being implemented on a tightly-integrated CPU and/or GPUand/or QPU+FPGA platform, with cache-coherent shared memory andhigh-bandwidth/low-latency CPU/GPU/QPU/FPGA interconnects.

Accordingly, in addition to mapping and aligning sequencer reads to areference genome, reads may be assembled “de novo,” e.g., without areference genome, such as by detecting apparent overlap between reads,e.g., in a pileup, where they fully or mostly agree, and joining theminto longer sequences, contigs, scaffolds, or graphs. This assembly mayalso be done locally, such as using all reads determined to map to agiven chromosome or portion thereof. Assembly in this manner may alsoincorporate a reference genome, or segment of one, into the assembledstructure.

In such an instance, due to the complexity of joining together readsequences that do not completely agree, a graph structure may beemployed, such as where overlapping reads may agree on a single sequencein one segment, but branch into multiple sequences in an adjacentsegment, as explained above. Such an assembly graph, therefore, may be asequence graph, where each edge or node represents one nucleotide or asequence of nucleotides that is considered to adjoin contiguously to thesequences in connected edges or nodes. In particular instances, such anassembly graph may be a k-mer graph, where each node represents a k-mer,or nucleotide sequence of (typically) fixed length k, and whereconnected nodes are considered to overlap each other in longer observedsequences, typically overlapping by k−1 nucleotides. In various methodsthere may be one or more transformations performed between one or moresequence graphs and k-mer graphs.

Although assembly graphs are employed in haplotype-based variantcalling, and some of the graph processing methods employed are similar,there are important differences. De novo assembly graphs are generallymuch larger, and employ longer k-mers. Whereas variant-calling assemblygraphs are constrained to be fairly structured and simple, such ashaving no cycles and flowing source-to-sink along a reference sequencebackbone, de novo assembly graphs tend to be more unstructured andcomplex, with cycles, dangling paths, and other anomalies not onlypermitted, but subjected to special analysis. De novo assembly graphcoloring is sometimes employed, assigning “colors” to nodes and edgessignifying, for example, which biological sample they came from, ormatching a reference sequence. Hence, a wider variety of graph analysisand processing functions need to be employed for de novo assemblygraphs, often iteratively or recursively, and especially due to the sizeand complexity of de novo assembly graphs, processing functions tend tobe extremely compute intensive.

Hence, as set forth above, an ideal model for such graph processing, ona tightly-integrated CPU/GPU/QPU+FPGA platform, is to retain such graphsin cache-coherent shared memory for alternating processing between theCPU/GPU/QPU software and FPGA hardware functions. In such an instance, asoftware thread processing a given graph may iteratively command variouscompute-intensive graph processing steps to be performed by a hardwareengine, and then inspect the results to thereby determine the next stepsto be performed by the hardware, such as by making appropriate hardwarecalls. Like above, this processing model, is greatly benefitted byimplementation on a tightly-integrated CPU+FPGA platform, withcache-coherent shared memory and high-bandwidth/low-latency CPU/FPGAinterconnect.

Additionally, as described herein below, tertiary analysis includesgenomic processing that may follow variant calling, which in clinicalapplications may include variant annotation, phenotype prediction,disease testing, and/or treatment response prediction, as describedherein. Reasons it is beneficial to perform tertiary analysis on such atightly-integrated CPU/GPU/QPU+FPGA platform are that such a platformconfiguration enables efficient acceleration of primary and/or secondaryprocessing, which are very compute intensive, and it is ideal tocontinue with tertiary analysis on the same platform, for convenienceand reduced turnaround time, and to minimize transmission and copying oflarge genomic data files. Hence, either a loosely or tightly-integratedCPU/GPU/QPU+FPGA platform is a good choice, but a tightly coupledplatform may include additional benefits because tertiary analysis stepsand methods vary widely from one application to another, and in any casewhere compute-intensive steps slow down tertiary analysis, custom FPGAacceleration of those steps can be implemented in an optimized fashion.

For instance, a particular benefit to tertiary analysis on atightly-integrated CPU/GPU/QPU+FPGA platform is the ability tore-analyze the genomic data iteratively, leveraging the CPU/GPU/QPUand/or FPGA acceleration of secondary processing, in response to partialor intermediate tertiary results, which may benefit additionally fromthe tight integration configuration. For example, after tertiaryanalysis detects a possible phenotype or disease, but with limitedconfidence as to whether the detection is true or false, focusedsecondary re-analysis may be performed with extremely high effort on theparticular reads and reference regions impacting the detection, thusimproving the accuracy and confidence of relevant variant calls, and inturn improving the confidence in the detection call. Additionally, iftertiary analysis determines information about the ancestry orstructural variant genotypes of the analyzed individual, secondaryanalysis may be repeated using a different or modified reference genome,which is more appropriate for the specific individual, thus enhancingthe accuracy of variant calls and improving the accuracy of furthertertiary analysis steps.

However, if tertiary analysis is done on a CPU-only platform afterprimary and secondary processing (possibly accelerated on a separateplatform), then re-analysis with secondary processing tools is likely tobe too slow to be useful on the tertiary analysis platform itself, andthe alternative is transmission to a faster platform, which is alsoprohibitively slow. Thus, in the absence of any form of hardware orquantum acceleration on the tertiary analysis platform, primary andsecondary processing must generally be completed before tertiaryanalysis begins, without the possibility of easy re-analysis oriterative secondary analysis. But on an FPGA-accelerated platform, andespecially a tightly-integrated CPU and/or GPU and/or QPU and/or FPGAplatform where secondary processing is maximally efficient, iterativeanalysis becomes practical and useful.

Accordingly, as indicated above, the modules herein disclosed may beimplemented in the hardware of the chip, such as by being hardwiredtherein, and in such instances their implementation may be such thattheir functioning may take place at a faster speed, with greateraccuracy, as compared to when implemented in software, such as wherethere are minimal instructions to be fetched, read, and/or executed.Additionally, in various instances, the functions to be performed by oneor more of these modules may be distributed such that various of thefunctions may be configured so as to be implemented by the host CPUand/or GPU and/or QPU software, whereas in other instances, variousother functions may be performed by the hardware of an associated FPGA,such as where the two or more devices perform their respective functionswith one another such as in a seamless fashion. For such purposes, theCPU, GPU, QPU, and/or FPGA may be tightly coupled, such as via a lowlatency, high bandwidth interconnect, such as a QPI, CCVI, CAPI, and thelike. Accordingly, in some instances, the high computationally intensivefunctions to be performed by one or more of these modules may beperformed by a quantum processor implemented by one or more quantumcircuits.

Hence, given the unique hardware and/or quantum processingimplementation, the modules of the disclosure may function directly inaccordance with their operational parameters, such as without needing tofetch, read, and/or execute instructions, such as when implementedsolely in CPU software. Additionally, memory requirements and processingtimes may be further reduced, such as where the communications withinchip is via files, e.g., stored locally in the FPGA/CPU/GPU/QPU cache,such as a cache coherent manner, rather than through extensive accessingan external memory. Of course, in some instances, the chip and/or cardmay be sized so as to include more memory, such as more on board memory,so as to enhance parallel processing capabilities, thereby resulting ineven faster processing speeds. For instance, in certain embodiments, achip of the disclosure may include an embedded DRAM, so that the chipdoes not have to rely on external memory, which would therefore resultin a further increase in processing speed, such as where aBurrows-Wheeler algorithm or De Brujin Graph may be employed, instead ofa hash table and hash function, which may in various instances, rely onexternal, e.g., host memory. In such instances, the running of a portionor an entire pipeline can be accomplished in 6 or 10 or 12 or 15 or 20minutes or less, such as from start to finish.

As indicated above, there are various different points where any givenmodule can be positioned on the hardware, or be positioned remotelytherefrom, such as on a server accessible on the cloud. Where a givenmodule is positioned on the chip, e.g., hardwired into the chip, itsfunction may be performed by the hardware, however, where desired, themodule may be positioned remotely from the chip, at which point theplatform may include the necessary instrumentality for sending therelevant data to a remote location, such as a server accessible via thecloud, so that the particular module's functionality may be engaged forfurther processing of the data, in accordance with the user selecteddesired protocols. Accordingly, part of the platform may include aweb-based interface for the performance of one or more tasks pursuant tothe functioning of one or more of the modules disclosed herein. Forinstance, where mapping, alignment, and/or sorting are all modules thatmay occur on the chip, in various instances, one or more of localrealignment, duplicate marking, base quality core recalibration, and/orvariant calling may take place on the cloud.

Particularly, once the genetic data has been generated and/or processed,e.g., in one or more primary and/or secondary processing protocols, suchas by being mapped, aligned, and/or sorted, such as to produce one ormore variant call files, for instance, to determine how the geneticsequence data from a subject differs from one or more referencesequences, a further aspect of the disclosure may be directed toperforming one or more other analytical functions on the generatedand/or processed genetic data such as for further, e.g., tertiary,processing, as depicted in FIG. 40. For example, the system may beconfigured for further processing of the generated and/or secondarilyprocessed data, such as by running it through one or more tertiaryprocessing pipelines 700, such as one or more of a genome pipeline, anepigenome pipeline, metagenome pipeline, joint genotyping, a MuTect2pipeline, or other tertiary processing pipeline, such as by the devicesand methods disclosed herein.

For instance, in various instances, an additional layer of processing800 may be provided, such as for disease diagnostics, therapeutictreatment, and/or prophylactic prevention, such as including NIPT, NICU,Cancer, LDT, AgBio, and other such disease diagnostics, prophylaxis,and/or treatments employing the data generated by one or more of thepresent primary and/or secondary and/or tertiary pipelines. For example,particular bioanalytic pipelines include genome pipelines, epigenomepipelines, meta genome pipelines, joint genotyping pipelines,GATK/MuTect2, and other such pipelines. Hence, the devices and methodsherein disclosed may be used to generate genetic sequence data, whichdata may then be used to generate one or more variant call files and/orother associated data that may further be subject to the execution ofother tertiary processing pipelines in accordance with the devices andmethods disclosed herein, such as for particular and/or general diseasediagnostics as well as for prophylactic and/or therapeutic treatmentand/or developmental modalities. See, for instance, FIGS. 41B and 43.

As described above, the methods and/or system herein presented mayinclude the generating and/or the otherwise acquiring of geneticsequence data. Such data may be generated or otherwise acquired from anysuitable source, such as by a NGS or “sequencer on a chip technology.”Once generated and/or acquired, the methods and systems herein mayinclude subjecting the data to further processing such as by one or moresecondary processing protocols. The secondary processing protocols mayinclude one or more of mapping, aligning, and sorting of the generatedgenetic sequence data, such as to produce one or more variant callfiles, for example, so as to determine how the genetic sequence datafrom a subject differs from one or more reference sequences or genomes.A further aspect of the disclosure may be directed to performing one ormore other analytical functions on the generated and/or processedgenetic data, e.g., secondary result data, such as for additional, e.g.,tertiary, processing, which processing may be performed on or inassociation with the same chip or chipset as that hosting theaforementioned sequencer technology.

Accordingly, in a first instance, such as with respect to thegeneration, acquisition, and/or transmission of genetic sequence data,as set forth in FIGS. 38-40, such data may be produced either locally orremotely and/or the results thereof may then be directly processed, suchas by a local computing resource 100, or may be transmitted to a remotelocation, such as to a remote computing resource 300, for furtherprocessing, e.g. for secondary and/or tertiary processing. For instance,the generated genetic sequence data may be processed locally, anddirectly, such as where the sequencing and secondary processingfunctionalities are housed on the same chipset and/or within the samedevice on-site 10. Likewise, the generated genetic sequence data may beprocessed locally, and indirectly, such as where the sequencing andsecondary processing functionalities occur separately by distinctapparatuses that share the same facility or location but may beseparated by a space albeit communicably connected, such as via a localnetwork 10. In a further instance, the genetic sequence data may bederived remotely, such as by a remote NGS, and the resultant data may betransmitted over a cloud based network 50 to a remote location 300, suchas separated geographically from the sequencer.

Specifically, as illustrated in FIG. 40, in various embodiments, a datageneration apparatus, e.g., nucleotide sequencer 110, may be provided onsite, such as where the sequencer is a “sequencer on a chip” or a NGS,wherein the sequencer is associated with a local computing resource 100either directly or indirectly such as by a local network connection 10.The local computing resource 100 may include or otherwise be associatedwith one or more of a data generation 110 and/or a data acquisition 120mechanism(s). Such mechanisms may be any mechanism configured for eithergenerating and/or otherwise acquiring data, such as analog, digital,and/or electromagnetic data related to one or more genetic sequences ofa subject or group of subjects, such as where the genetic sequence datais in a BCL or FASTQ file format.

For example, such a data generating mechanism 110 may be a primaryprocessor such as a sequencer, such as a NGS, a sequencer on a chip, orother like mechanism for generating genetic sequence information.Further, such data acquisition mechanisms 120 may be any mechanismconfigured for receiving data, such as generated genetic sequenceinformation; and/or together with the data generator 110 and/orcomputing resource 100 is capable of subjecting the same to one or moresecondary processing protocols, such as a secondary processing pipelineapparatus configured for running a mapper, aligner, sorter, and/orvariant caller protocol on the generated and/or acquired sequence dataas herein described. In various instances, the data generating 110and/or data acquisition 120 apparatuses may be networked together suchas over a local network 10, such as for local storage 200; or may benetworked together over a local and/or cloud based network 30, such asfor transmitting and/or receiving data, such as digital data related tothe primary and/or secondary processing of genetic sequence information,such as to or from a remote location, such as for remote processing 300and/or storage 400. In various embodiments, one or more of thesecomponents may be communicably coupled together by a hybrid network asherein described.

The local computing resource 100 may also include or otherwise beassociated with a compiler 130 and/or a processor 140, such as acompiler 130 configured for compiling the generated and/or acquired dataand/or data associated therewith, and a processor 140 configured forprocessing the generated and/or acquired and/or compiled data and/orcontrolling the system 1 and its components, as herein described, suchas for performing primary, secondary, and/or tertiary processing. Forinstance, any suitable compiler may be employed, however, in certaininstances, further efficiencies may be achieved not only by implementinga tight-coupling configuration, such as discussed above, for theefficient and coherent transfer of data between system components, butmay further be achieved by implementing a just-in-time (JIT) computerlanguage compiler configuration.

Specifically, as used herein just-in-time (JIT) refers to a device,system, and/or method for converting acquired and/or generated fileformats from one form to another. In a broad usage structure, the JITsystem disclosed herein may include a compiler 130, or other computingarchitecture, e.g., a processing program, that may be implemented in amanner so as to convert various code from one form into another. Forinstance, in one implementation, a JIT compiler may function to convertbytecode, or other program code that contains instructions that must beinterpreted, into instructions that can be sent directly to anassociated processor 140 for near immediate execution, such as withoutthe need for interpretation of the instructions by the particularmachine language. Particularly, after a coding program, e.g., a Javaprogram, has been written, the source language statements may becompiled by the compiler, e.g., Java compiler, into bytecode, ratherthan compiled into code that contains instructions that match any givenparticular hardware platform's processing language. This bytecodecompiling action, therefore, is platform-independent code that can besent to any platform and run on that platform regardless of itsunderlying processor. Hence, a suitable compiler may be a compiler thatis configured so as to compile the bytecode into platform-specificexecutable code that may then be executed immediately. In this instance,the JIT compiler may function to immediately convert one file formatinto another, such as “on the fly”.

Hence, a suitably configured compiler, as herein described, is capableof overcoming various deficiencies in the art. Specifically, pastcompiling programs that were written in a specific language had to berecompiled and/or re-written dependent on each specific computerplatform on which it was to be implemented. In the present compilingsystem, the compiler may be configured so as to only have to write andcompile a program once, and once written in a particular form, may beconverted into one or more other forms nearly immediately. Morespecifically, the compiler may be a JIT, or other similar dynamictranslation compiler format, which is capable of writing instructions ina platform agnostic language that does not have to be recompiled and/orre-written dependent on the specific computer platform on which it isimplemented. For instance, in a particular use model, the compiler maybe configured for interpreting compiled bytecode, and/or otherinstructions, into instructions that are understandable by a givenparticular processor for the conversion of one file format into another,regardless of computing platform. Principally, the JIT system herein iscapable of receiving one genetic file, such as representing a geneticcode, for example, where the file is a BCL or FASTQ file, e.g.,generated from a genetic sequencer, and rapidly converting it intoanother form, such as into a SAM, BAM, and/or CRAM file, such as byusing the methods disclosed herein.

Particularly, in various instances, the system herein disclosed mayinclude a first and/or a second compiler 130 a and 130 b, such as avirtual compiling machine, that handles one or a plurality of bytecodeinstruction conversions at a time. For instance, using a Java typejust-in-time compiler, or other suitably configured second compiler,within the present system platform, will allow for the compiling ofinstructions into bytecode that may then be converted into theparticular system code, e.g., as though the program had been compiledinitially on that platform. Accordingly, once the code has been compiledand/or (re-)compiled, such as by the JIT compiler(s) 130, it will runmore quickly in the computer processor 140. Hence, in variousembodiments, just-in-time (JIT) compilation, or other dynamictranslation compilation, may be configured so as to be performed duringexecution of a given program, e.g., at run time, rather than prior toexecution. In such an instance, this may include the step(s) oftranslation to machine code or translation into another format, whichmay then be executed directly, thereby allowing for one or more ofahead-of-time compilation (AOT) and/or interpretation.

More particularly, as implemented within the present system, a typicalgenome sequencing dataflow generally produces data in one or more fileformats, derived from one or more computing platforms, such as in a BCL,FASTQ, SAM, BAM, CRAM, and/or VCF file format, or their equivalents. Forinstance, a typical DNA sequencer 110, e.g., an NGS, produces rawsignals representing called bases that are designated herein as reads,such as in a BCL and/or FASTQ file, which may optionally be furtherprocessed, e.g., enhanced image processing, and/or compressed 150.Likewise, the reads of the generated BCL/FASTQ files may then be furtherprocessed within the system, as herein described, so as to producemapping and/or alignment data, which produced data, e.g., of the mappedand aligned reads, may be in a SAM or BAM file format, or alternativelya CRAM file format. Further, the SAM or BAM file may then be processed,such as through a variant calling procedure, so as to produce a variantcall file, such as a VCF file or gVCF file. Accordingly, all of theseproduced BCL, FASTQ, SAM, BAM, CRAM, and/or VCF files, once produced are(extremely) large files that all need to be stored such as in systemmemory architecture locally 200 or remotely 400. The storage of any oneof these files is expensive. The storage of one or more of them, e.g.,all of them, is extremely expensive.

As indicated, just-in-time (JIT) or other dual compiling or dynamictranslation compilation analysis, may be configured and deployed hereinso as to reduce such high storage costs. For instance, a JIT analysisscheme may be implemented herein so as to store data in only one format(e.g., a compressed FASTQ or BAM, etc., file format), while providingaccess to one or more file formats (e.g., BCL, FASTQ, SAM, BAM, CRAM,and/or VCF, etc.). This rapid file conversion process may be effectuatedby rapidly processing the genomic data utilizing the herein disclosedrespective hardware and/or quantum acceleration platforms, e.g., such asfor mapping, aligning, sorting, and/or variant calling (or componentfunctions thereof, such as HMM and Smith-Waterman, compression anddecompression, and the like), in hardware engines on an integratedcircuit, such as an FPGA, or by a quantum processor. Hence, byimplementing JIT or similar analysis along with such acceleration, thegenomic data can be processed in a manner so as to generate desired fileformats on the fly, at speeds comparable to normal file access. Thus,considerable storage savings may be realized by JIT-like processing withlittle or no loss of access speed.

Particularly, two general options are useful for the underlying storageof the genomic data produced herein so as to be accessible for JIT-likeprocessing, these include the storage of unaligned reads (e.g., that mayinclude compressed FASTQ, or unaligned compressed SAM, BAM, or CRAMfiles), and the storage of aligned reads (e.g., that may includecompressed BAM or CRAM files). However, since the accelerated processingdisclosed herein allows any of the referenced file formats to be derivedrapidly, the underlying file format for storage may be selected so as toachieve the smallest compressed file size, thereby decreasing theexpense of storage. Hence, because of the comparatively smaller filesize for unprocessed, e.g., raw un-aligned, read data, there is anadvantage to storing unaligned reads so that the data fields areminimized.

More particularly, in view of the rapid processing speeds achievable bythe devices, systems, and methods disclosed herein, in many instances,there may be no need to store mapped and/or alignment information foreach and every read, because this information may be rapidly derivedupon need, such as on the fly. Further, although a compressed FASTQ(e.g. FASTQ.gz) file format is commonly used for storage of geneticsequence data, such unaligned reads may be stored in more advancedcompressed formats as well, such as SAM, BAM, or CRAM files, which mayfurther reduce the file size, such as by use of compact binaryrepresentation and/or more targeted compression methods. Hence, thesefile formats may be compressed prior to storage, be decompressed afterstorage, and processed rapidly, such as on the fly, so as to convert onefile format from another.

However, an advantage to storing aligned reads is that much or all ofeach read's sequence content can be omitted. Specifically, systemefficiency can be enhanced and storage space saved by only storing thedifferences between the read sequences and the selected referencegenome, such as at indicated variant alignment positions of the read.More specifically, since differences from the reference are usuallysparse, the aligned position and list of differences can often be morecompactly stored than the original read sequence. Therefore, in variousinstances, the storage of an aligned read format, e.g., when storingdata related to the differences of aligned reads, may be preferable tothe storage of unaligned read data. In such an instance, if an alignedread format is used as the underlying storage format, such as in a JITprocedure, other formats, such as a SAM, BAM, and/or CRAM, compressedfile formats, may also be used.

Along with the aligned and/or unaligned read file data to be stored, awide variety of other data, such as metadata derived from the variouscomputations determined herein, may also be stored. Such computated datamay include read mapped, alignment and/or subsequent processing data,such as alignment scores, mapping confidence, edit distance from thereference, etc. In certain instances, such metadata and/or other extrainformation need not be retained in the underlying storage for JITanalysis, such as in those instances where it can be reproduced on thefly, such as by the accelerated data processing herein described.

With respect to meta-data, this data may be a small file that instructsthe system as to how to go backwards or forwards from one file formatinto conversion to another file format. Hence, the meta-data file allowsthe system to create a bit-compatible version of any other file type.For instance, proceeding forward from an originating data file, thesystem need only access and implement the instructions of the meta-data.Along with rapid file format conversion, JIT also enables rapidcompression and/or decompression and/or storage, such as in a genomicsdropbox memory cache.

As discussed in greater detail below, once sequence data is generated110, it may be stored locally 200, and/or may be made accessible forstorage remotely, such as in a cloud accessible dropbox memory cahce400. For example, once in the dropbox, the data may appear as accessibleon the cloud 50, and may then be further processed, e.g., substantiallyimmediately. This is particularly useful when there is amapping/aligning/sorting/variant calling system 100/300 of thedisclosure on either side of the cloud 50 interface facilitating theautomatic uploading and processing of the data, which can be furtherprocessed such as using the JIT technology herein described.

For instance, an underlying storage format for JIT compiling and/orprocessing may contain only minimal data fields, such as read name, basequality scores, alignment position, and/or orientation in the reference,and a list of differences from the reference, such as where each fieldmay be compressed in an optimal manner for its data type. Various othermetadata may be included and/or otherwise associated with the storagefile. In such an instance, the underlying storage for JIT analysis maybe in a local file system, 200 such as on hard disk drives and solidstate drives, or a network storage resource such as NAS or object orDropbox like storage system 400. Particularly, when various fileformats, such as BCL, FASTQ, SAM, BAM, CRAM, VCF, etc., have beenproduced for a genomic dataset, which may be submitted for JITprocessing and/or storage, the JIT or other similar compiling and/oranalysis system may be configured so as to convert the data to a singleunderlying storage format for storage. Additional data, such as metadataand/or other information (which may be small) necessary to reproduce allother desired formats by accelerated genomic data processing, may alsobe associated with the file and stored. Such additional information mayinclude one or more of: a list of file formats to be reproduced, dataprocessing commands to reproduce each format, unique ID (e.g., URL orMD5/SHA hash) of reference genome, various parameter settings, such asfor mapping, alignment, sorting, variant calling, and/or any otherprocessing, as described herein, randomization seeds for processingsteps, e.g., utilizing pseudo-randomization, to deterministicallyreproduce the same results, user Interface, and the like.

In various instances, the data to be stored and/or retrieved in a JIT orsimilar dynamic translation processing and/or analysis system may bepresented to the user, or other applications, in a variety of manners.For instance, one option is to have the JIT analysis storage in astandard or custom “JIT object” file format, such as SAM, BAM, CRAM, orother custom format, and provide user tools to rapidly convert the JITobject into the desired format (e.g., in a local temporary storage)using the accelerated processing disclosed herein. Another option is topresent the appearance of multiple file formats, such as BCL, FASTQ,SAM, BAM, CRAM, VCF, etc. to the user, and the user applications, insuch a manner that the file-system access to various file formatsutilizes a JIT procedure. A further option is to make user tools thatotherwise accept specific file formats (BCL, FASTQ, SAM, BAM, CRAM, VCF,etc.) that are able to be presented as a JIT object instead, and mayautomatically call for JIT analysis to obtain the data in the desireddata format, e.g., BCL, FASTQ, SAM, BAM, CRAM, VCF, etc.

Accordingly, JIT procedures are useful for providing access to multiplefile formats, e.g., BCL, FASTQ, SAM, BAM, CRAM, VCF, and the like, froma single file format by rapidly processing the underlying storedcompressed filed format. Additionally, it remains useful even if only asingle file format is to be accessed, because compression is stillachieved relative to storing the accessed format directly. In such aninstance, the underlying file storage format may be different than theaccessed file format, and/or may contain less metadata, and/or may becompressed more efficiently than the accessed format.

In various instances, the methods of JIT analysis, as provided herein,may also be used for transmission of genomic data, over the internet oranother network, to minimize transmission time and lessen consumednetwork bandwidth. Particularly, in the storage application, a singlecompressed underlying file format may be stored, and/or one or moreformats may be accessed via accelerated genomic data processing.Similarly, in the transmission application, only a single compressedunderlying file format may be transmitted from a source network node toa destination network node, such as where the underlying format may bechosen primarily for smallest compressed file size, and/or where alldesired file formats may be generated at the destination node by or forgenomic data processing, such as on the fly. In this many, only onecompressed data file format need be used for storage and/or transfer,from which file format the other various file formats may be derived.

For instance, hardware and/or quantum accelerated genomic dataprocessing, as herein described, may be utilized in (or by) both thesource network node, to generate and/or compress the underlying formatfor transmission, and the destination network node, to decompress and/orgenerate other desired file formats by accelerated genomic dataprocessing. Nevertheless, JIT or other dynamic translation analysiscontinues to be useful in the transmission application even if only oneof the source node or the destination node utilizes hardware and/orquantum accelerated genomic data processing. For example, a data serverthat sends large amounts of genomic data may utilize hardware and/orquantum accelerated genomic data processing so as to generate thecompressed underlying format for transmission to various destinations.In such instances, each destination may use slower software genomic dataprocessing to generate other desired data formats. Hence, although thespeed advantage of JIT analysis is lessened at the destination node,transmission time, and network utilization are still usefully reduced,and the source node is able to service many such transmissionsefficiently due to its hardware and/or quantum accelerated genomic dataprocessing.

Further, in another example, a data server that receives uploads oflarge amounts of genomic data, e.g., from various sources, may utilizehardware and/or quantum accelerated genomic data processing and/orstorage, while the various source nodes may use slower software run on aCPU/GPU to generate the compressed underlying file format fortransmission. Alternatively, hardware and/or quantum accelerated genomicdata processing may be utilized by one or more intermediate networknodes, such as a gateway server, between the source and destinationnodes, to transmit and/or receive genomic data in a compressedunderlying file format, according to the JIT or other dynamictranslation analysis methods, thus gaining the benefits of reducedtransmission time and network utilization without overburdening the saidintermediate network nodes with excessive software processing.

Hence, as can be seen with respect to FIG. 40, in certain instances, thelocal computing resource 100 may include a compiler 130, such as a JITcompiler, and may further include a compressor unit 150 that isconfigured for compressing data, such as generated and/or acquiredprimary and/or secondary processed data, which data may be compressed,such as prior to transfer over a local 10 and/or cloud 30 and/or hybridcloud based 50 network, such as in a JIT analysis procedure, and whichmay be decompressed subsequent to transfer and/or prior to use.

As described above, in various instances, the system may include a firstintegrated and/or quantum circuit 100 such as for performing a mapping,aligning, sorting, and/or variant calling operation, so as to generateone or more of mapped, aligned, sorted, and/or variant called resultsdata. Additionally, the system may include a further integrated and/orquantum circuit 300 such as for employing the results data in theperformance of one or more genomics and/or bioinformatics pipelineanalyses, such as for tertiary processing. For instance, the result datagenerated by the first integrated and/or quantum circuit 100 may beused, e.g., by the first or a second integrated and/or quantum circuit300, in the performance of a further genomics and/or bioinformaticspipeline processing procedure. Specifically, secondary processing ofgenomics data may be performed by a first hardware and/or quantumaccelerated processor 100 so as to produce results data, and tertiaryprocessing may be performed on that results data, such as where thefurther processing is performed by a CPU and/or GPU and/or QPU 300 thatis operatively coupled to the first integrated circuit. In such aninstance, the second circuit 300 may be configured for performingtertiary processing of the genomics variation data produced by the firstcircuit 100. Accordingly, the results data derived from the firstintegrated server acts as an analysis engine driving the furtherprocessing steps described herein with reference to tertiary processing,such as by the second integrated and/or quantum processing circuit 300.

However, the data generated in each of these primary and/or secondaryand/or tertiary process steps may be immense, requiring very highresource and/or memory costs such as for storage, either locally 200 orremotely 400. For instance, in a first primary processing step,generated nucleic acid sequence data 110, such as in a BCL and/or FASTQfile format, may be received 120, such as from an NGS 110. Regardless ofthe file format of this sequence data, the data may be employed in asecondary processing protocol as described herein. The ability toreceive and process primary sequence data directly from an NGS, such asin a BCL and/or FASTQ file format, is very useful. Particularly, insteadof converting the sequence data file from the NGS, e.g., BCL, to a FASTQfile, the file may be directly received from the NGS, e.g., as a BCLfile, and may be processed, such as by being received and converted bythe JIT system, e.g., on the fly, into a FASTQ file that may then beprocessed, as described herein, such as to produce a mapped, aligned,sorted, and/or variant called results data that may then be compressed,such as into a SAM, BAM, and/or CRAM file, and/or may be subjected tofurther processing, such as by one or more of the disclosed genomicstertiary processing pipelines.

Accordingly, such data once produced needs to be stored in some manner.However, such storage is not only resource intensive, it is also costly.Specifically, in a typical genomics protocol, the sequenced data oncegenerated is stored as a large FASTQ file. Then, once processed such asby being subjected to a mapping and/or aligning protocol, a BAM file iscreated, which file is also typically stored, increasing the expense ofgenomic data storage, such as by having to store both a FASTQ and a BAMfile. Further, once the BAM file is processed, such as by beingsubjected to variant calling protocol, a VCF file is produced, which VCFalso typically needs to be stored. In such an instance, in order toadequately provide and make use of the generated genetic data, all threeof the FASTQ, BAM, and VCF files may need to be stored, either locally200 or remotely 400. Additionally, the original BCL file may also bestored. Such storage is inefficient as well as being memory resourceintensive and expensive.

However, the computational power of the hardware and/or quantumprocessing architectures implemented herein, along with the JITcompilation, compression, and storage, greatly ameliorates theseinefficiencies, resource costs, and expenses. For instance, in view ofthe methods implemented and the processing speeds achieved by thepresent accelerated integrated circuits, such as for the conversion of aBCL file to a FASTQ file, and then the conversion of a FASTQ file to aSAM or BAM file, and then the conversion of a BAM file to a CRAM and/orVCF file, and back again, the present system greatly reduces the numberof computing resources and/or file sizes needed for the efficientprocessing and/or storage of such data. The benefits of these systemsand methods are further enhanced by the fact that only one file format,e.g., a BCL, FASTQ, SAM, BAM, CRAM, and/or VCF, need be stored, fromwhich all the other file formats may be derived and processed.Particularly, only one file format needs to be saved and from such fileany of the other file formats may be generated rapidly, e.g., on thefly, in accordance with the methods disclosed herein, such as in a justin time, or JIT, compiling format.

For example, in accordance with typical prior methods, a large amount ofcomputing resources, e.g., server farms and large memory banks, isneeded for the processing and storage of FASTQ files being generated bya NGS sequencer. Particularly, in a typical instance, once the NGSproduces the large FASTQ file, the server farm would then be employed toreceive and convert the FASTQ file to a BAM and/or CRAM file, whichprocessing may take up to a day or more. However, once produced, the BAMfile itself must then be stored, requiring further time and resources.Likewise, the BAM or CRAM file may be processed in such a manner togenerate a VCF, which may also take up another day or more, and whichfile will also need to be stored, thereby incurring further resourcecosts and expenses. More particularly, in a typical instance, the FASTQfile for a human genome consumes about 90 GB of storage, per file.Likewise, a typical human genome BAM file may consume about 160 GB. TheVCF file may also need to be stored, albeit such files are quite smallerthan the FASTQ and/or BAM files. SAM and CRAM files may also begenerated throughout the secondary processing procedures, and these toomay need to be stored.

Prior to the technologies provided herein, it has been computationallyintensive to go from one step to another, e.g., from one file format toanother, and hence, all of the data for these file formats wouldtypically have to be stored. This is in part due to the fact that if auser ever wanted to go back and regenerate one or more of the files, itwould require a large amount of computing resources and time to re-dothe processes involved to regenerate the various files thereby incurringa high monetary expense. Further, where these files are compressedbefore storage, such compression may take from about 2 to about 5 toabout 10 or more hours, with about the same amount of time required fordecompression, prior to reuse. Because of these high expenses, typicalusers would not compress such files prior to storage, and would alsotypically store all two, three or more file formats, e.g., BCL, FASTQ,BAM, VCF, incurring increased costs over increased time.

Accordingly, the JIT protocols employed herein make use of theaccelerated processing speeds achieved by the present hardware and/orquantum accelerators, so as to realize enhanced efficiency, at reducedtime and costs both for processing as well as for storage. Instead ofstoring 2, 3, or more copies of the same general data in different fileformats, only one file format needs to be stored, and on the fly, any ofthe other file types can be regenerated, such as using the acceleratedprocessing platforms discussed herein. Particularly, from storing aFASTQ file, the present devices and systems make it easy to go backwardsto a BCL file, or forwards to a BAM file, and then further to a VCF,such as in under 30 minutes, such as within 20 minutes, or about within15 or 10 minutes, or less.

Hence, using the pipelines and the speed of processing offered by thehardwired/quantum processing engines herein disclosed, only a singlefile format need be stored, while the other file formats may easily andrapidly be generated therefrom. So instead of needing to store all threefile formats, a single file format need be stored from which any otherfile format may be regenerated such as on the fly, just in time for thefurther processing steps desired by the user. Consequently, the systemmay be configured for ease of use such that if a user simply interactswith a graphical user interface, such as presented at an associateddisplay of the device, e.g., the user clicks on the FASTQ, BAM, VCF,etc. button presented in the GUI, the desired file format may bepresented, while in the background, one or more of the processingengines of the system may be performing the accelerated processing stepsnecessary for regenerating the requested file in the requested fileformat from the stored file.

Typically, one or more of a compressed version of a BCL, FASTQ, SAM,BAM, CRAM, and/or VCF file will be saved, along with a small metafilethat includes all of the configurations of how the system was run tocreate the compressed and/or stored file. Such metafile data details howthe particular file format, e.g., FASTQ and/or BAM file, was generatedand/or what steps would be necessary for going backwards or forwards soas to generate any of the other file formats. In a manner such as thisthe process can proceed forwards or be reversed going backwards usingthe configuration stored in the metafile. This can be about an 80% ormore reduction in storage and economic cost if the computing function isbundled with the storage functions.

As can be seen with respect to FIG. 40, these files may be stored200/400 on a server that is accessible via the cloud 30/50. As such, thesystem is set up such that with respect to a user storing generatedsequence data on the cloud, the user of the system may log on and accessthe cloud based server and thereby store the generated data in a singlefile format, such as on a hybrid cloud 400. Further, with the click of abutton the user can access all of the other file formats, which wouldthen be processed and generated behind the scene, e.g., on the fly, thuscutting down on both processing time and burden as well as storagecosts, such as where the computing and the storage functions are bundledtogether.

Accordingly, there are two parts of this process that are enabled by thespeed of performing the accelerated mapping, aligning, sorting, and/orvariant calling functions in the hardwired and/or quantum processingconfiguration, so as to enable seamless compression and storing of onlya single file type with on-the-fly regeneration of any of the other filetypes. In particular embodiments, it would be the BAM file, or acompressed SAM or CRAM file associated therewith, which would be stored,and from that file the others may be generated, e.g., in a forward or areverse direction, such as to reproduce a VCF or FASTQ or BCL file,respectively. For instance, when a FASTQ file is originally stored, whengoing in the forward direction, a checksum of the file may be taken.Likewise, when going backward, a checksum may be generated on the filethat is being recreated going backward, and these checksums may then beused to ensure that the recreated files match identically to one anotherand/or their compressed file formats. In a manner such as this it may beensured that all of the data is stored, the system knows exactly wherethe data is stored, in what file format it is stored, what the originalfile format was in, and from this data the system can regenerate anyfile format in an identical manner going forwards or backwards betweenfile formats (once the template is originally generated).

Hence, the speed advantage of the “just in time” compiling is enabled inpart by the hardware and/or quantum implemented generation of therelevant files, such as in generating a BAM file from a previouslygenerated FASTQ file. Particularly, compressed BAM files, including SAMand CRAM files, are not typically stored within a database because ofthe increased time it takes prior to processing to decompress thecompressed stored file. However, the JIT system allows this to be donewithout substantial penalties. More particularly, implementing thedevices and processes disclosed herein, not only can generated sequencedata be compressed and decompressed rapidly, e.g., almostinstantaneously, it may also be stored efficiently. Additionally, fromthe stored file, in whatever file format it is stored, any of the otherfile formats may be regenerated in mere moments.

Hence, as can be seen with reference to FIG. 41A, when the acceleratedhardware and/or quantum processing performs various secondary processingprocedures, such as mapping and aligning, sorting, and variant calling,a further step of compression may also be performed, such as in an allin one process, prior to storage in the compressed form. Then when theuser desires to analyze or otherwise use the compressed data, the filemay be retrieved, decompressed, and/or converted from one file format toanother, and/or be analyzed, such as by the JIT engine(s) being loadedinto the hardwired processor, or configured within the quantumprocessor, and subjecting the compressed file to one or more proceduresof the JIT pipeline.

Accordingly, in various instances, the FPGA can be fully or partiallyreconfigured, and/or a quantum processing engine may be organized, so asto perform a JIT procedure. Particularly, the JIT module can be loadedinto the system and/or configured as one or more engines, which enginesmay include one or more compression engines 150 that are configured forworking in the background. Hence, when a given file format is called,the JIT-like system may perform the necessary operations on therequested data so as to produce a file in the requested format. Theseoperations may include compression and/or decompression as well asconversion so as to derive the requested data in the identified fileformat.

For instance, when genetic data is generated, it is usually produced ina raw data format, such as a BCL file, which then may get converted intoa FASTQ file, e.g., by the NGS that generates the data. However, withthe present system, the raw data files, such as in BCL or other raw fileformat, may be streamed or otherwise transmitted into the JIT module,which can then convert the data into a FASTQ file and/or into anotherfile format. For example, once a FASTQ file is generated, the FASTQ filemay then be processed, as disclosed herein, and a corresponding BAM filemay be generated. And likewise, from the BAM file a corresponding VCFmay be generated. Additionally, SAM and CRAM files may also be generatedduring appropriate steps. Each one of these steps may be performed veryrapidly, especially once the appropriate file format has once beengenerated. Hence, once the BCL file is received, e.g., straight from thesequencer, the BCL can be converted into a FASTQ file or be directlyconverted into a SAM, BAM, CRAM, and/or VCF file, such as by a hardwareand/or quantum implemented mapping/aligning/sorting/variant callingprocedure.

For example, in one use model, on a typical sequencing instrument, alarge number of different subject's genomes may be loaded intoindividual lanes of a single sequencing instrument to be run inparallel. Consequently, at the end of the run, a large number of diverseBCL files, derived from all the different lanes and representing thewhole genomes of each of the different subjects, are generated in amultiplex complex. Accordingly, these multiplexed BCL files may then bede-multiplexed, and respective FASTQ files may be generated representingthe genetic code for each individual subject. For instance, if in onesequencing run N BCL files are generated, these files will need to bede-multiplexed, layered, and stitched together for each subject. Thisstitching is a complex process where each subject's genetic material isconverted to BCL files, which may then be converted to a FASTQ file orused directly for mapping, aligning, and/or sorting, variant calling,and the like. This process may be automated so as to greatly speed upthe various steps of the process.

As can be seen with respect to FIG. 40, once this data has beengenerated 110, it may then be stored in a password protected and/orencrypted memory cache, such as in a dedicated genomics dropbox-likememory 400. Accordingly, as the generated and/or processed genetic datacomes off of the sequencer, the data may be processed and/or stored andmade available to other users on other systems, such as in adropbox-like cache 400. In such an instance, the automatedbioinformatics analysis pipeline system may then access the data in thecache and automatically begin processing it. For example, as can be seenwith respect to FIG. 41B, the system may include a management systemhaving a controller, such as a microprocessor or other intelligence,e.g., artificial intelligence, that manages the retrieving of the BCLand/or FASTQ files, e.g., from the memory cache, and then directs theprocessing of that information, so as to generate a BAM, CRAM, SAM,and/or VCF, thereby automatically generating and outputting the variousprocessing results and/or storing the same in the dropbox memory 400.

A unique benefit of JIT processing, as implemented within this usemodel, is that JIT allows the various genetic files produced to becompressed, e.g., prior to data storage, and to be decompressed rapidlyprior to usage. Hence, JIT processing can compile and/or compress and/orstore the data as it is coming off the sequencer, where such storage isin a secure genomic dropbox memory cache. This genomic dropbox cache 400may be a cloud 50 accessible memory cache that is configured for thestoring of genomics data received from one or more automated sequencers110, such as where the sequencer(s) are located remotely from the memorycache 400.

Particularly, once the sequence data has been generated 110, e.g., by aremote NGS, it may be compressed 150 for transmission and/or storage400, so as to reduce the amount of data that is being uploaded to andstored in the cloud 50. Such uploading, transmission, and storage may beperformed rapidly because of the data compression 150 that takes placein the system, such as prior to transmission. Additionally, onceuploaded and stored in the cloud based memory cache 400, the data maythen be retrieved, locally 100 or remotely 300, so as to be processed inaccordance with the devices, systems, and methods of the BiolT pipelinedisclosed herein, so as to generate a mapping, aligning, sorting, and/orvariant call file, such as a SAM, BAM, and/or CRAM file, which may thenbe stored, along with a meta-file that sets forth the information as tohow the generated file, e.g., SAM, BAM, CRAM, etc. file, was produced.

Hence, when taken together with the metadata, the compressed SAM, BAM,and/or CRAM file may then be processed to produce any of the other fileformats, such as FASTQ and/or VCF files. Accordingly, as discussedabove, on the fly, JIT can be used to regenerate the FASTQ file or VCFfrom the compressed BAM file and vice versa. The BCL file can also beregenerated in like manner. It is to be noted that SAM and CRAM filescan likewise be compressed and/or stored and can be used to produce oneor more of the other file formats. For instance, a CRAM file, which canbe un-CRAMed, can be used to produce a variant call file, and likewisefor the SAM file. Hence, only the SAM, BAM and/or CRAM file need besaved and from these files, the other file formats, e.g., VCF, FASTQ,BCL files, can be reproduced.

Accordingly, as can be seen with respect to FIG. 40, a mapping and/oraligning and/or sorting and/or variant calling instrument 110 may beon-site 100 and/or another second corresponding instrument 300 may belocated remotely and made accessible in the cloud 30/50. Thisconfiguration, along with the devices and methods disclosed herein, isconfigured to enable a user to rapidly perform a BiolT analysis, asherein disclosed, so as to produce results data, which results data maythen be processed so as to be compressed, and once compressed the datamay be uploaded and made accessible via a cloud based interface. In suchan instance, the compressed and uploaded data may be stored 400, e.g.,“in the cloud,” such as a SAM, BAM, and/or CRAM file.

Further, when desired, the second mapping and/or aligning and/or sortingand/or variant calling instrument 300, e.g., associated with the cloud50, may then access the stored and/or compressed file(s) and may processthose files so as to rapidly generate a BCL, FASTQ, SAM, BAM, VCF orother file format from the stored and/or compressed files, e.g., on thefly, using JIT processing. This configuration thereby alleviates thetypical transfer speed bottleneck. Hence, in various embodiments, thesystem may include, a first mapping and/or aligning and/or sortingand/or variant calling instrument 100, which may be positioned locally,such as for local data production, compression 150, and/or storage 200;and a second instrument 300 may be positioned remotely and associated inthe cloud 50, whereby the second instrument 300 is configured forreceiving the generated and compressed data and storing it, e.g., via anassociated storage device 400. Once stored, the data may be accessed,e.g., by the first and/or second instrument, for decompression andconversion of the stored files into one or more of the other fileformats.

Therefore, in one implementation of the system as exemplified in FIG.40, data e.g., raw sequence data such as in a BCL or FASTQ file format,which is generated by a data generating apparatus, e.g., a sequencer110, may be uploaded and stored in the cloud 30/50, such as in anassociated genomics dropbox-like memory cache 400. This data may then beaccessed by a first mapping and/or aligning and/or sorting and/orvariant calling instrument 100, as described herein, which may thenprocess the sequence data to produce mapped, aligned, sorted, and/orvariant results data. This result data may then be compressed and/orstored in the genomics dropbox cache 400, such as in a SAM, BAM, CRAMand/or VCF file. It is to be noted that the first instrument 100 may belocal and associated with the sequencing instrument 110 itself, or maybe remote and associated with a local cloud 30 and/or a local 200 orremote memory cache 400. A second mapping and/or aligning and/or sortingand/or variant calling instrument 300, e.g., a cloud based instrument,with the proper authorities, may then connect with the genomics drop box400, so as to access the files, e.g., compressed files, and may thendecompress those files to make the results available for further, e.g.,secondary or tertiary, processing.

Accordingly, in various instances, the system may be streamlined suchthat as data is generated and comes off of the sequencer 110, such as inraw data format, it may either be immediately uploaded into the cloud 50and stored in a genomics dropbox 400, or it may be transmitted to aBiolT processing system 300 for further processing and/or compressionprior to being uploaded and stored 400. Once stored within the memorycache 400, the system may then immediately queue up the data forretrieval, compression, decompression, and/or for further processingsuch as by another associated BiolT processing apparatus 300, which whenprocessed into results data may then be compressed and/or stored 400 forfurther use later. At this point, a tertiary processing pipeline may beinitiated whereby the stored results data from secondary processing maybe decompressed and used such as for tertiary analysis, in accordancewith the methods disclosed herein.

Hence, in various embodiments, the system may be pipelined such that allof the data that comes off of the sequencer 110 may either becompressed, e.g., by a local computing resource 100, prior to transferand/or storage 200, or the data may be transferred directly into thegenomics dropbox folder for storage 400. Once received thereby, thestored data may then substantially immediately be queued for retrievaland compression and/or decompression, such as by a remote computingresource 300. After being decompressed the data may substantiallyimmediately be available for processing such as for mapping, aligning,sorting, and/or variant calling to produce secondarily processed resultsdata that may then be re-compressed for storage. Afterward, thecompressed secondary results data may then be accessed, e.g., in thegenomics dropbox 400, be decompressed, and/or be used in one or moretertiary processing procedures. As the data may be compressed whenstored and substantially immediately decompressed when retrieved, it isavailable for use by many different systems and in many differentbioanalytical protocols at different times, simply by accessing thedropbox storage cache 400.

Therefore, in such manners as these, the Bio-IT platform pipelinespresented herein may be configured so as to offer incredible flexibilityof data generation and/or analysis, and are adapted to handle the inputof particular forms of genetic data in multiple formats so as to processthe data and produce output formats that are compatible for variousdownstream analysis. Accordingly, as can be seen with respect to FIG.40, presented herein are devices, systems, and methods for performinggenetic sequencing analysis, which may include one or more of thefollowing steps. First, a file input is received, the input may be inone or more of a FASTQ or BCL file format, which file may then bedecompressed, and/or processed herein so as to generate a VCF/gVCF. Suchcompression and/or decompression may occur at any suitable timethroughout the process.

Accordingly, in certain instances, the file to be received by the systemmay be streamed or otherwise transferred to the system directly from thesequencing apparatus, e.g., NGS, and as such the transferred file may bein a BCL file format. Where the received file is in a BCL file format itmay be converted, and/or otherwise demultiplexed, into a FASTQ file forprocessing by the system, or the BCL file may be processed directly. Forinstance, the platform pipeline processors can be configured to receiveBCL data that is streamed directly from the sequencer, or it may receivedata in a FASTQ file format. However, receiving the sequence datadirectly as it is streamed off of the sequencer is useful because itenables the data to go directly from raw sequencing data to beingprocessed into a VCF for output.

Accordingly, once the BCL or the FASTQ file is received, it may bemapped and/or aligned, which mapping and/or aligning, may be performedon single end or paired end reads, such as with read lengths that mayrange from about 10 or about 20, such as 26 bp or less up to about 1K,or about 2.5K, or about 5K, even about 10K bp or more. Once mappedand/or aligned the sequence may then be sorted, such as position sorted,such as through binning by reference range and/or sorting of the bins byreference position. Additionally, the sequence data may be processed viaduplicate marking, such as based on the starting position and CIGARstring so as to generate a high quality duplicate report. At this point,a SAM file may be generated, which when compressed may form a BAM file,such as for storage and/or further processing. Further, once the BAMfile has been retrieved, the sequence data may be forwarded to a variantcalling module of the system, such as a haplotype variant caller withreassembly, which in some instances, may employ one or more of a HiddenMarkov Model and/or Smith-Waterman Alignment that may be implemented, ineither in software and/or hardware, so as to generate a VCF.

Hence, the system and/or one or more of its components may be configuredso as to be able to convert BCL data to FASTQ or SAM/BAM/CRAM dataformats, which may then be sent throughout the system for furtherprocessing and/or data reconstruction. For instance, once the sequencedata is mapped or aligned, e.g., to produce a SAM file, the SAM file maythen be compressed into one or more BAM files, which may then betransmitted to a VCF engine so as to be converted throughout theprocessing of the system to a VCF/gVCF, which may then be compressedinto a CRAM file. Consequently, the files to be output along the systemmay be a Gzip and/or CRAM file.

Particularly, as can be seen with respect to FIG. 40, one or more of thefiles, once generated may be compressed and/or transferred from onesystem component to another, and once received may then be decompressed,e.g., if previously compressed, or converted/demultiplexed. Moreparticularly, once a BCL file is received, it may be converted into aFASTQ file that may then be processed by the integrated circuit(s) ofthe system, so as to be mapped and/or aligned. Once mapped and/oraligned, the resulting sequence data, e.g., in a SAM file format, may beprocessed further such as by being compressed one or more times, e.g.,into a BAM file, which data may then be processed by position sorting,duplicate marking, and/or variant calling the results of which, e.g., ina VCF format, may then be compressed once more. Particularly, the systemmay be adapted so as to process BCL data directly, thereby eliminating aFASTQ file conversion step. Likewise, the BCL data may be fed directlyto the pipeline to produce a unique output VCF file per sample.Intermediate SAM/BAM/CRAM files can then be generated on demand. Thesystem, therefore, may be configured for receiving and/or transmittingone or more data files, such as a BCL or FASTQ data file containingsequence information, and processing the same so as to produce a datafile that has been compressed, such as a SAM/BAM/CRAM data file.

Accordingly, as can be seen with respect to FIG. 41A, a user may want toaccess the compressed file and convert it to an original version of thegenerated BCL 111 c and/or FASTQ file 111 d, such as for subjecting thedata to further, e.g., more advanced, signal processing 111 b, such asfor error correction. Alternatively, the user may access the rawsequence data, e.g., in a BCL or FASTQ file format 111, and subject thatdata to further processing, such as for mapping 112 and/or aligning 113.The results data from these procedures may then be compressed and/orstored 114. The same or another user may then want to access thecompressed form of the mapped and/or aligned results data and then runanother analysis on the data, such as to produce one or more VCFs 115that may then be compressed and/or stored. An additional user of thesystem may then access the compressed VCF file 116, decompress it, andsubject the data to one or more tertiary processing protocols.

Further, a user may want to do a pipeline compare. Themapping/aligning/sorting/variant calling is useful for preformingvarious genomic analysis. For instance, if a further DNA or RNAanalysis, or some other kind of analysis, is afterward desired, a usermay want to run the data through another pipeline, and hence havingaccess to the original data file so as to be regenerated is very useful.Likewise, this process may be useful such as where a differentSAM/BAM/CRAM file may be desired to be created, or recreated, such aswhere there is a new or different reference genome generated, and henceit may be desired to re-do the mapping and aligning to the new referencegenome.

Storing the compressed SAM/BAM/CRAM files is further useful because itallows a user of the system to take advantage of the fact that areference genome forms the backbone of the results data. In such aninstance, it is not the data that agrees with the reference that isimportant, but rather how the data disagrees with the reference. Hence,only that data that disagrees with the reference is essential forstorage. Consequently, the system can take advantage of this fact bystoring only what is important and/or useful to the users of the system.Thus, the entire genomic file (showing agreement and disagreement withthe reference), or a sub-portion of it (showing only agreement ordisagreement with the reference), may be configured for being compressedand stored. It may be seen, therefore, that as only the differencesand/or variations between the reference and the genome being examinedare the most useful to examine, in various embodiments, only thesedifferences need be stored, as anything that is the same as thereference need not be reviewed again. Accordingly, since any givengenome differs only slightly from a reference, e.g., 99% of humangenomes are typically identical, after the BAM file is created, it isonly the variations between the reference genome that need be reviewedand/or saved.

Additionally, another useful component of the system is a workflowmanagement controller, which may be used to automate the system flow.Such system animation may include utilizing the various systemcomponentry to access data, either locally or remotely, as and/or whereit becomes available and then substantially automatically subjecting thedata to further processing steps, such with respect to the BiolTpipelines disclosed herein. Accordingly, the workflow managementcontroller is a core automation technology for directing the variouspipelines of the system, and in various instances may employ anartificial intelligence component, See FIG. 41B.

Specifically, the workflow management controller allows the system toreceive inputs from multiple sequencing instruments, e.g., 110 a, 110 b,110 c, etc., and/or multiple inputs from a single sequencing instrument110, where the data being received represents the genomes of multiplesubjects. In such instances, the workflow management controller not onlykeeps track of all of the incoming data, but it also efficientlyorganizes and facilitates the secondary and/or tertiary processing ofthe received data. Accordingly, the workflow management controllerallows the system to seamlessly connect to both small and largesequencing centers, where all kinds of genetic material may be comingthrough one or more sequencing instruments at the same time, all ofwhich may be transferred into the system, such as over the cloud 50.

More specifically, as can be seen with respect to FIG. 41, in variousinstances, one or a multiplicity of samples may be received within thesystem, and hence the system may be configured for receiving andefficiently processing the samples, either sequentially or in parallel,such as in a multi sample processing regime. Accordingly, to streamlineand/or automate multi sample processing, the system may be controlled bya comprehensive Workflow Management System (WMS) or LIMS (laboratoryinformation management system). The WMS enables users to easily schedulemultiple workflow runs for any pipeline, as well as to adjust oraccelerate NGS analysis algorithms, platform pipelines, and theirattendant applications.

In such an instance, each run sequence may have a bar code on itindicating the type of sequence it is, the file format, and/or whatprocessing steps have been performed, and what processing steps need tobe performed. For instance, the bar code may include a manifestindicating “this is a genome run, of subject X, in file format Y, sothis data has to go through pipeline Z,” or likewise may indicate “thisis A's result data that needs to go in this reporting system.”Accordingly, as the data is received, processed, and transmitted throughthe system, the bar codes and results will get loaded into the workflowmanagement system, such as LIMS (laboratory information managementsystem). LIMS, in this instance, may be a standard tool that is employedfor the management of laboratories, or it may be a specifically designedtool used for managing process flow.

In any instance, the workflow management controller tracks a barcodedsample from when it arrives in a given site, e.g., for storage and/orprocessing, until the results are sent out to the user. Particularly,the workflow management controller is configured to track all data as itflows through the system end-to-end. More particularly, as the samplecomes in, the bar code associated with the sample is read, and based onthat reading the system determines what the requested work flows are andprepares the sample for processing. Such processing may be simple, suchas being run through a single genome pipeline, or it may be morecomplex, such as by being run through multiple, e.g., five pipelines,that need to be stitched together. In one particular model it may be runthrough the system, it then may be run through a GATK equivalent module,the results may be compared, and then the sample may be transmitted toanother pipeline for further, e.g., tertiary processing.

Hence, the system as a whole can be run in accordance with severaldifferent processing pipelines. In fact, many of the system processescan be interconnected, where the workflow manager is notified orotherwise determines that a new job is pending, quantifies the jobmatrices, identifies available resources for performing the requiredanalyses, loads the job into the system, receives the data coming in,e.g., off the sequencer, loads it in, and then processes it.Particularly, once the workflow is set up, it can be saved, and then amodified bar code gets assigned to that workflow, and the automatedprocess takes place in accordance with the directives of the workflow.

Prior to the present automated workflow management system, it would takea number of Bioinformaticians a long period of time to configure and setup the system, and its component parts, and it would then requirefurther time for actually running the analysis. To make matters morecomplicated, the system would have to be reconfigured prior to receivingthe next sample to analyze, requiring even more time to reconfigure thesystem for analyzing the new sample set. With the technology disclosedherein the system can be entirely automated. The present system,particularly, is configured so as to automatically receive multiplesamples, map them to multiple different workflows and pipelines, and runthem on the same or multiple different system cards.

Accordingly, the workflow management system reads the job requirementsof the bar codes, allocates resources for performing the jobs, e.g.,regardless of location, updates the sample barcode, and directs thesamples to the allocated resources, e.g., processing units, forprocessing. Hence, it is the workflow manager that determines thesecondary and/or tertiary analyses protocols that will be run on thereceived samples. These processing units are resources that areavailable for delineating and performing the operations allocated toeach data set. Particularly, the work flow controller controls thevarious operations associated with receiving and reading the sample,determining jobs, allocating resources for the performance of thosejobs, connecting all system components, and advancing the sample setthrough the system from component to component. The controller,therefore, acts to manage the overall system from start to finish, e.g.,from sample receipt to VCF generation, and/or through to tertiaryprocessing.

Hence, the system may include a display device having a graphic userinterface for allowing a potential user of the system to transmit sampledata for entry into one or more of the BiolT pipelines disclosed herein.The GUI is configured for allowing the user to manage the systemcomponents, e.g., via a suitably configured web portal, and to tracksample processing progress, regardless of whether the computingresources to be engaged are available locally or remotely. Accordingly,the GUI may list a set of jobs that may be performed and/or a set ofresources for performing the jobs, and the user may self-select whichjobs they want run and by which resources. Hence, in an instance such asthis, each individual user may build thereon a unique, or may use apredetermined, analysis workflow, such as by clicking on, dragging, orotherwise selecting whatever work projects they desire to be run.

For instance, in one use model, a dashboard is presented with a GUIinterface that may include a plurality of icons representing the variousprocesses that may be implemented and run on the system. In such aninstance, a user can click on or drag the selected work process iconsinto a workflow interface, so as to build a desired workflow process,which once built may be saved and used to establish the controlinstructions for the sample set barcodes. Once the desired work projectshave been selected, the work flow management controller may configurethe desired workflow processes, and then identify and select theresources for performing the selected analysis.

Once the workflow analysis process begins, the dashboard may be viewedso as to track progress through the system. For example, the dashboardmay indicate how much data is running through the system, what processesare being run on the data, how much has been accomplished, how muchprocessing remains, what workflows have been completed, and which stillneed to be accessed, the latest projects to be run, and which runs havebeen completed. Essentially, full access to everything that's running onthe system, or a sub-portion thereof, may be provided to the desktop.

Further, in various instances, the desktop may include various differentuser interfaces that may be accessible via one or more tabs. Forinstance, one tab for accessing the system controls may be a “localresources tab,” which when selected allows a user to select controlfunctions that are capable of being implemented locally. Another tab maybe configured for accessing “cloud resources,” which when selectedallows a user to select other control functions that are capable ofbeing implemented remotely. Accordingly, in interacting with thedashboard, a user can select which resources to perform which tasks, andas such can increase or decrease resource usage as required so as tomeet the project requirements.

Hence, as the computational complexity increases, and/or increased speedis desired, the user (or the system itself, e.g., WMS) can bring moreand more resources online, as needed, such as by the mere click of abutton, instructing the workflow manager to bring additional localand/or cloud based resources online, as needed to complete the taskwithin the desired timeframe. In this manner, although the system isautomated and/or controlled by the workflow manager controller, a userof the system can still set the control parameters, and when needed canbring cloud based resources on line. Accordingly, the controller canexpand to the cloud as needed to bring on line additional processingand/or storage resources.

In various instances, the desktop interface may be configured as amobile application or “app” that is accessible via a mobile deviceand/or desktop computer. Consequently, in one aspect, a genomics marketplace, or cohort, may be provided so as to allow a plurality of users tocollaborate in one or more research projects, so as to form anelectronic cohort market place that is accessible via the dashboard app,e.g., a web based browser interface. As such, the system may provide anonline forum for performing collaborative research and/or a market placefor developing various analytical tools for analyzing genetic data,which system may be accessible directly via the system interface, or viathe app, to allow remote control of the system by a user.

For instance, as can be seen with reference to FIG. 43, in one aspect,an online app store is provided to allow users to develop, sell, and usegenomics tools that can be incorporated into the system and be employedto analyze the genomic data transmitted to and entered into the system.Particularly, the genomic app store enables customers that desire todevelop genetic tests, e.g., like a NICU test, and once developed may beuploaded on to the system, e.g., genetic marketplace, for purchase andrunning as a platform thereon, so that anyone running the newlydeveloped system platform, can deploy the uploaded tests via the webportal. More particularly, a user can browse the web portal “app” store,find a desired test, e.g., the NICU test, download it, and/or configurethe system to implement it, such as on their uploadable genetic data.The online “cohort” marketplace, therefore, presents a rapid andefficient way to deploy new genetic analytic applications, whichapplications allow for identical results to be obtained from any of thepresent system platforms that runs the downloaded application. Moreparticularly, the online market place provides a mechanism for anyone towork with the system to develop genetic analysis applications thatremote users can download and configure for use in accordance with thepresent workflow models.

Another aspect of the cohort marketplace disclosed herein is that itallows for the secure sharing of data. For instance, presently, genomicdata is highly protected. Often such genetic data is large and difficultto transfer in a secure and protected manner, such as where thesubject's identity is restricted. However, the present genetics marketplace allows cohort participants to share genetic data without having toidentify the subject. In such a market place, cohort participants canshare questions and processes so as to advance their research in aprotected and secure environment, without risking the identity of theirrespective subject's genomes. Additionally, a user can enlist the helpof other researchers in the analysis of their sample sets withoutidentifying to whom those genomes belong.

For instance, a user can identify subjects having a specific genotypeand/or phenotype, such as stage 3 breast cancer, and/or having beentreated with a particular drug. A cohort can be formed to see how thesedrugs affect cancerous cell growth on a genetic level. Therefore, thesecharacteristics, amongst others, may form a cohort selection criteriathat will allow other researchers, e.g., remotely located, to performstandard genetic analyses on the genetic data, using uniform analyticprocedures, on subjects they have access to that fit within the cohortcriteria. In this manner, a given researcher need not be responsible foridentifying and securing all members of a sample set, e.g., subjectsfitting within the criteria, to substantiate his or her scientificinquiry.

Particularly, Researcher A may set up a research cohort within themarketplace, and identify the appropriate selection criteria forsubjects, the genomic test(s) to be run, and the parameters by which thetest is to be run. Researchers B and C, located remotely from ResearcherA, may then sign up for the cohort, identify and select subjectsmatching the criteria, and then run the specified tests on theirsubjects, using the uniform procedures disclosed herein, so as to helpResearcher A achieve or better accomplish his or her research goals inan expeditious manner. This is beneficial because only a portion ofgenetic data is being transmitted, subject identity is protected, and asthe data is being analyzed using the same genetic analysis systememploying the same parameters, the results data will be the sameregardless of where and on what machine the test(s) are run.Consequently, the cohort market place allows users to form and buildcohorts simply by posting the selection criteria and run parameters onthe dashboard. Compensation rates may also be posted and paymentsrendered by employing a suitably configured commerce, e.g., monetaryexchange, program.

Anyone that accepts participation in the cohort can then download thecriteria and data file(s) and/or use genetic data of subjects they havealready generated and/or stored in performing the requested analyses.For instance, each cohort participant will have, or be able to generate,a database of BCL and/or FASTQ files that are stored in their individualservers. These genetic files will have been derived from subjects whohappen to meet the selection criteria. Specifically, this stored geneticand/or other data of the subject may be scanned so as to determinesuitability for inclusion within the cohort selection criteria. Suchdata may have been generated for a number of purposes, but regardless ofthe reasons for the generation, once generated it may be selected andsubjected to the requested pipeline analyses and used for inclusionwithin the cohort.

Accordingly, in various embodiments, the cohort system may be a forumfor connecting researchers, so as to allow them to pool their resourcesand data, e.g., genetic sequence data. For example, engaging a cohortwould allow a first researcher to introduce a project requiring geneticdata analyses requiring the mining and/or examination of a number ofgenomes from various subjects, such as with respect to mapping,aligning, variant calling, and/or the like. Therefore, instead of havingto gather subjects and collect sample sets individually, the cohortinitiator can advertise the need for a prescribed analyses procedure tobe run on sample sets previously or to be collected by others, and assuch a collective approach to generating sample sets and analyzing thesame is provided for by the cohort organization herein. Particularly,the cohort initiator can set up the cohort selection, create aconfiguration file to be shared with the potential cohort participants,create the workflow parameters, e.g., within a workflow folder, and canthereby automate data generation and analyses, e.g., via the workflowmanagement system. The system may also enable the commercial aspect ofthe transaction, e.g., the payment processing for compensating thecohort participants for their provision of genetic data sets that may beanalyzed, such as with respect to mapping, aligning, variant calling,and/or with respect to tertiary analyses.

In various embodiments, the cohort structured analyses may be directedto primary processing, e.g., of either DNA or RNA, such as with respectto image processing and/or base quality score recalibration, methylationanalysis, and the like; and/or may be directed to the performance ofsecondary analysis, such as with respect to mapping, aligning, sorting,variant calling, and the like; and/or may be directed to tertiaryanalysis, such as with respect to genomic, epigenomic, metagenomic,joint genotyping, GATK, and/or other forms of tertiary analyses.Additionally, it is to be understood that although many of the pipelinesand analyses performed thereby may involve primary and/or secondaryprocessing, various analysis platforms herein may not be directed toprimary or secondary processing. For instance, in certain instances, ananalysis platform may be exclusively directed to performing tertiaryanalysis, such as on genetic data, or other forms of genomics and/orbioinformatics analyses.

For example, in particular embodiments, with respect to the particularanalytical procedures to be run, the analyses to be performed mayinclude one or more of mapping, aligning, sorting, variant calling, andthe like, so as to produce results data that may be subjected to one ormore other secondary and/or tertiary analyses procedures, depending onthe specific pipelines selected to be run. The workflow may be simple orit may be complex, e.g., it may require the performance of one pipelinemodule, e.g., mapping, or multiple modules, such as mapping, aligning,sorting, variant calling, and/or others, but an important parameter isthat the workflow should be identical for each person that takes part ofthe cohort. Particularly, a unique feature of the system is that therequester establishing the cohort sets forth the control parameters soas to ensure that the analysis to be performed are performed in the samemanner, regardless of where those procedures are performed and on whatmachines.

Consequently, when setting up the cohort the requester will upload bothselection criteria along with a configuration file. Other cohortparticipants will then view the selection criteria to determine if theyhave data sets of genetic information falling within the set forthcriteria, and if so will perform the requested analysis on the data,based on the settings of the configuration file. Researches may sign upto be selected as a cohort participant, and if subscription is great alottery or competition can be held to select the participants. Invarious instances, a bidding system could be initiated. The results datagenerated by the cohort participants may be processed onsite or on thecloud, and as long as the configuration file is followed, the processingof the data will be the same. Particularly, the configuration file setsforth how the BiolT analytics device is to be configured, and once thedevice is set up in accordance with the prescribed configuration, adevice associated with the system will perform the requested geneticanalyses in the same manner regardless of where located, e.g., locallyor remotely. The results data may then be uploaded onto the cohortmarket place, and payment tendered and received in view of the receivedresults data.

For instance, the analysis of the genetic data may be performed locally,and the results uploaded onto the cloud, or the genetic data itself maybe uploaded and the analyses run on the cloud, e.g., a server or servernetwork associated with the cloud. In various instances, it may beuseful to only upload the results data, so as to better protect thesubjects' identities. Particularly, by uploading only results data, notonly is security protected, but large amounts of data need not betransferred, thereby enhancing system efficiency.

More particularly, in various instances, a compressed file containingresults data from one or more of the pipelines may be uploaded, and insome instances, only a file containing a description of variations needbe uploaded. In some instances, only an answer need be given, such as atext answer, e.g., a “yes” or “no” answer. Such answers are preferableas they do not set forth the identity of the subject. However, if theanalyses need to be performed online, e.g., in the cloud, selected BCLand/or FASTQ files may be uploaded, the analyses performed, and theresults data may then be pushed back to the initial submitter, who canthen upload the results data at the cohort interface. The original rawdata may then be deleted from the online memory. In this and other suchmanners, the cohort requester will not have access to the identities ofthe subjects.

Compression, such as that employed in “just in time analysis” (JIT), isparticularly useful in enhancing cohort efficiency. For instance, usingtypical procedures, the movement of data into and out of the cohortsystem is very expensive. Accordingly, although in variousconfigurations, raw and/or uncompressed data uploaded to the system maybe stored there, in particular instances, the data can be compressedprior to being uploaded, the data may then be processed within thesystem, and the results can then be compressed prior to beingtransmitted out of the system, such as where the compression iseffectuated in accordance with a JIT protocol. In this instance, storageof such data, such as in a compressed form is less expensive, andtherefore the cohort system is very cost efficient.

Additionally, in various instances, a plurality of cohorts may beprovided within an online marketplace, and given the compressionprocesses herein described, data may be transmitted from one cohort toanother, so as to allow researches of various different cohorts to sharedata between them, which without the compression methods disclosedherein could be prohibitively costly. Particularly, without the speedand efficiency of JIT compression data once transmitted into the cloud,would typically stay in the cloud, albeit it would be accessible thereinfor review and manipulation. However, JIT allows data to be quicklytransmitted to and from the cloud for both local and/or cloud basedprocessing. Further, as can be seen with respect to FIG. 41B, inparticular instances, the system 1 may be configured for subjecting thegenerated and/or secondarily processed data to further processing, e.g.,via a local 100 and/or a remote 300 computing resource, such as byrunning it through one or more tertiary processing pipelines, such asone or more of a genome pipeline, for instance, an epigenome pipeline,metagenome pipeline, joint genotyping, a GATK/MuTect2 pipeline, or othertertiary processing pipeline. The results data from such processing maythen be compressed and/or stored locally 200 and/or be transferred so asto be stored remotely 400.

In additional instances, as can be seen with respect to FIG. 41C, thesystem 1 may include a further tier of processing modules, such asconfigured for rendering additional processing, e.g., of the secondaryand/or tertiary processing results data, such as for diagnosis, diseaseand/or therapeutic discovery, and/or prophylaxis thereof. For instance,in various instances, an additional layer of processing may be provided,such as for disease diagnostics, therapeutic treatment, and/orprophylactic prevention, such as including NIPT, NICU, Cancer, LDT,AgBio, and other such disease diagnostics, prophylaxis, and/ortreatments employing the data generated by one or more of the presentprimary and/or secondary and/or tertiary pipelines.

Accordingly, herein presented is a system 1 for producing and using alocal and/or global hybrid cloud network 30/50. For instance, presently,the local cloud 30 is used primarily for storage, such as at a remotestorage location 400. In such an instance, the computing of data isperformed locally 100 by a local computing resource 140, and wherestorage needs are extensive, the cloud 30 is accessed so as to store thedata generated by the local computing resource 140, such as by use of aremote storage resource 400. Hence, generated data is typically eitherwholly managed on site locally 100, or it is totally managed off site300, on the cloud 30.

Particularly, in a general implementation of a bioinformatics analysisplatform, the local computing 140 and/or storage 200 functions aremaintained locally on site 100, and where storage needs exceed localstorage capacity, or where there is a need for stored data to be madeavailable to other remote users, such data may be transferred viainternet 30 to a global cloud 50 for remote storage 400 thereby. In suchan instance, where the computing resources 140 required for performanceof the computing functions are minimal, but the storage requirementsextensive, the computing function 140 may be maintained locally 100,while the storage function 400 may be maintained remotely, with thefully processed data being transferred back and forth between the localprocessing function 140, such as for local processing only, and thestorage function 400, such as for the remote storage 400 of theprocessed data, such as by employing the JIT protocols disclosed hereinabove.

For instance, this may be exemplified with respect to the sequencingfunction, such as with a typical NGS, where the data generation and/orcomputing resource 100 is configured for performing the functionsrequired for the sequencing of the genetic material so as to producegenetic sequenced data, e.g., reads, which data is produced onsite 100and/or transferred onsite locally. These reads, once generated, such asby the onsite NGS, may then be transferred, e.g., as a BCL or FASTQfile, over the cloud network 30, such as for storage 400 at a remotelocation 300 in a manner so as to be recalled from the cloud 30 whennecessary, such as for further processing. For example, once thesequence data has been generated and stored, e.g., 400, the data maythen be recalled, e.g. for local usage, such as for the performance ofone or more of secondary and/or tertiary processing functions, that isat a location remote from the storage facility 400, e.g., locally 100.In such an instance, the local storage resource 200 serves merely as astorage cache where data is placed while waiting transfer to or from thecloud 30, such as to or from the remote storage facility 400.

Likewise, where the computing function is extensive, such as requiringone or more remote computing servers or computing cluster cores 300 forprocessing the data, and where the storage demands for storing theprocessed data 200 are relatively minimal, as compared to the computingresources 300 required to process the data, the data to be processed maybe sent, such as over the cloud 30, so as to be processed by a remotecomputing resource 300, which resource may include one or more cores orclusters of computing resources, e.g., one or more super computingresources. In such an instance, once the data has been processed by thecloud based computer core 300, the processed data may then betransferred over the cloud network 30 so as to be stored local 200 andreadily available for use by the local computing resource 140, such asfor local analysis and/or diagnostics. Of course, the remotely generateddata 300 may also be stored remotely 400.

This may be exemplified with respect to a typical secondary processingfunction, such as where the pre-processed sequenced, e.g., read, datathat is stored locally 200 is accessed, such as by the local computingresource 100, and transmitted over the cloud internet 30 to a remotecomputing facility 300 so as to be further processed thereby, e.g., in asecondary or tertiary processing function, to obtain processed resultsdata that may then be sent back to the local facility 100 for storage200 thereby. This may be the case where a local practitioner generatessequenced read data using a local data generating resource 100, e.g.,automated sequencer, so as to produce a BCL or FASTQ file, and thensends that data over the network 30 to a remote computing facility 300,which then runs one or more functions on that data, such as aBurrows-Wheeler transform or Needlemen-Wunsch and/or Smith-Watermanalignment function on that sequence data, so as to generate resultsdata, e.g., in a SAM file format, that may then be compressed andtransmitted over the internet 30, e.g., as a BAM file, to the localcomputing resource 100 so as to be examined thereby in one or more localadministered processing protocols, such as for producing a VCF, whichmay then be stored locally 200. In various instances the data may alsobe stored remotely 400.

What is needed, however, is a seamless integration between theengagement between local 100 and remote 300 computer processing as wellas between local 200 and remote 400 storage, such as in the hybrid cloud50 based system presented herein. In such an instance, the system can beconfigured such that local 100 and remote 300 computing resources areconfigured so as to run seamlessly together, such that data to beprocessed thereby can be allocated real time to either the local 200 orthe remote 300 computing resource without paying an extensive penaltydue to transfer rate and/or in operational efficiency. This may be thecase, for instance, where the software and/or hardware and/or quantumprocessing to be deployed or otherwise run by the computing resourcesare configured so as to correspond to one another and/or are the same orfunctionally similar, e.g., the hardware and/or software is configuredin the same manner so as to run the same algorithms in the same manneron the generated and/or received data.

For instance, as can be seen with respect to FIG. 41A a local computingresource 100 may be configured for generating or for receiving generateddata, and therefore may include a data generating mechanism 110, such asfor primary data generation and/or analysis, e.g., so as to produce aBCL and/or a FASTQ sequence file. This data generating mechanism 110 maybe or may be associated with a local computer 100, as described hereinthroughout, having a processor 140 that may be configured to run one ormore software applications and/or may be hardwired so as to perform oneor more algorithms such as in a wired configuration on the generatedand/or acquired data. For example, the data generating mechanism 110 maybe configured for one or more of generating data, such as sequencingdata 111. In various embodiments, the generated data may be sensed data111 a, such as data that is detectable as a change in voltage, ionconcentration, electromagnetic radiation, and the like; and/or the datagenerating mechanism 110 may be configured for generating and/orprocessing signal, e.g., analog or digital signal data, such as datarepresenting one or more nucleotide identities in a sequence or chain ofassociated nucleotides. In such an instance, the data generatingmechanism 110, e.g., sequencer 111, may further be configured forpreliminarily processing the generated data so as for signal processing111 b or to perform one or more base call operations 111 c, such as onthe data so as to produce sequence identity data, e.g., a BCL and/orFASTQ file.

It is to be noted that in this instance, the data 111 so produced may begenerated locally and directly, such as by a local data generating 110and/or computing resource 140, e.g., a sequencer on a chip.Alternatively, the data may be produced locally and indirectly, e.g., bya remote computing and/or generating resource, such as a remote NGS. Thedata, e.g., in BCL and/or FASTQ file format, once produced may then betransferred indirectly over the local cloud 30 to the local computingresource 100 such as for secondary processing 140 and/or storage therebyin a local storage resource 200, such as while awaiting further localprocessing 140. In such an instance, where the data generation resourceis remote from the local processing 100 and/or storage 200 resources,the corresponding resources may be configured such that the remoteand/or local storage, remote and local processing, and/or communicatingprotocols employed by each resource may be adapted to smoothly and/orseamlessly integrate with one another, e.g., by running the same,similar, and/or equivalent software and/or by having the same, similar,and/or equivalent hardware configurations, and/or employing the samecommunications and/or transfer protocols, which, in some instances, mayhave been implemented at the time of manufacture or later thereto.

Specifically, in one implementation, these functions may be implementedin a hardwired configuration such as where the sequencing function andthe secondary processing function are maintained upon the same orassociated chip or chipset, e.g., such as where the sequencer andsecondary processor are directly interconnected on a chip, as hereindescribed. In other implementations, these functions may be implementedon two or more separate devices via software, e.g., on a quantumprocessor, CPU, or GPU that has been optimized to allow the two remotedevices to communicate seamlessly with one another. In otherimplementations, a combination of optimized hardware and softwareimplementations for performing the recited functions may also beemployed.

More specifically, the same configurations may be implemented withrespect to the performance of the mapping, aligning, sorting, variantcalling, and/or other functions that may be deployed by the local 100and/or remote 300 computing resources. For example, the local computing100 and/or remote 300 resources may include software and/or hardwareconfigured for performing one or more secondary 600 or tertiary 700tiers of processing functions 112-115 on locally and/or remotelygenerated data, such as genetic sequence data, in a manner that theprocessing and results thereof may be seamlessly shared with one anotherand/or stored thereby. Particularly, the local computing function 100and/or the remote computing function 300 may be configured forgenerating and/or receiving primary data, such as genetic sequence data,e.g., in a BCL and/or a FASTQ file format, and running one or moresecondary 600 or tertiary 700 processing protocols on that generatedand/or acquired data. In such an instance, one or more of theseprotocols may be implemented in a software, hardware, or combinationalformat, such as run on a quantum processor, a CPU, and/or a GPU. Forinstance, the data generating 110 and/or the local 140 and/or the remote300 processing resource may be configured for performing one or more ofa mapping operation 112, an alignment operation 113, or other relatedfunction 114 on the acquired or generated data in software and/or inhardware.

Accordingly, in various embodiments, the data generating resource, suchas the sequencer 111, e.g., NGS or sequencer on a chip, whetherimplemented in software and/or in hardware, or a combination of thesame, may further be configured to include an initial tier of processors500 such as a scheduler, various analytics, comparers, graphers,releasers, and the like, so as to assist the data generator 111, e.g.,sequencer, in converting biological information into raw read data, suchas in a BCL or FASTQ file format 111 d. Further, the local computing 100resource, whether implemented in software and/or in hardware, or acombination of the same, may further be configured to include a furthertier of processors 600 such as may include a mapping engine 112, or mayotherwise include programming for running a mapping algorithm on thegenetic sequence data, such as for performing a Burrows-Wheelertransform and/or other algorithms for building a hash table and/orrunning a hash function 112 a on said data, such as for hash seedmapping, so as to generate mapped sequence data. Further still, thelocal computing 100 resource whether implemented in software and/or inhardware, or a combination of the same, may further be configured toinclude an initial tier of processors 600 such as may also include analignment engine 113, as herein described, or may otherwise includeprogramming for running an alignment algorithm on the genetic sequencedata, e.g., mapped sequenced data, such as for performing a gappedand/or gapless Smith-Waterman alignment, and/or Needleman-Wunsch, orother like scoring algorithm 113 a on said data, so as to generatealigned sequence data.

The local computing 100 and/or data generating resource 110 may also beconfigured to include one or more other modules 114, whether implementedin software and/or in hardware, or a combination of the same, which maybe adapted to perform one or more other processing functions on thegenetic sequence data, such as on the mapped and/or aligned sequencedata. Thus, the one or more other modules may include a suitablyconfigured engine 114, or otherwise include programming, for running theone or more other processing functions such as a sorting 114 a,deduplication 114 b, recalibration 114 c, local realignment 114 d,duplicate marking 114 f, Base Quality Score Recalibration 114 gfunction(s) and/or a compression function (such as to produce a SAM,Reduced BAM, and/or a CRAM compression and/or decompression file) 114 e,in accordance with the methods herein described. In various instances,one or more of these processing functions may be configured as one ormore pipelines of the system 1.

Likewise, the system 1 may be configured to include a module 115,whether implemented in software and/or in hardware, or a combination ofthe same, which may be adapted for processing the data, e.g., thesequenced, mapped, aligned, and/or sorted data in a manner such as toproduce a variant call file 116. Particularly, the system 1 may includea variant call module 115 for running one or more variant callfunctions, such as a Hidden Markov Model (HMM) and/or GATK function 115a such as in a wired configuration and/or via one or more softwareapplications, e.g., either locally or remotely, and/or a converter 115 bfor the same. In various instances, this module may be configured as oneor more pipelines of the system 1.

In particular embodiments, as set forth in FIG. 41B, the system 1 mayinclude a local computing function 100 that may be configured foremploying a computer processing resource 140 for performing one or morefurther processing functions on data, e.g., BCL and/or FASTQ data,generated by the system generator 110 or acquired by the systemacquisition mechanism 120 (as described below), such as by beingtransferred thereto, for instance, by a third party 121, such as via acloud 30 or hybrid cloud network 50. For example, a third party analyzer121 may deploy a remote computing resource 300 so as to generaterelevant data in need of further processing, such as genetic sequencedata or the like, which data may be communicated to the system 1 overthe network 30/50 so as to be further processed. This may be useful, forinstance, where the remote computing resource 300 is a NGS, configuredfor taking raw biological data and converting it to a digitalrepresentation thereof, such as in the form of one or more FASTQ filescontaining reads of genetic sequence data, and where further processingis desired, such as to determine how the generated sequence of anindividual differs from that of one or more reference sequences, asherein described, and/or it is desired to subject the results thereof tofurthered, e.g., tertiary, processing.

In such an instance, the system 1 may be adapted so as to allow one ormore parties, e.g., a primary and/or secondary and/or third party user,to access the associated local processing resources 100, and/or asuitably configured remote processing resource 300 associated therewith,in a manner so as to allow the user to perform one or more quantitativeand/or qualitative processing functions 152 on the generated and/oracquired data. For instance, in one configuration, the system 1 mayinclude, e.g., in addition to primary and/or secondary 600 processingpipelines, a third tier of processing modules 700, which processingmodules may be configured for performing one or more processingfunctions on the generated and/or acquired primary and/or secondaryprocessed data.

Particularly, in one embodiment, the system 1 may be configured forgenerating and/or receiving processed genetic sequence data 111 that hasbeen either remotely or locally mapped 112, aligned 113, sorted 114 a,and/or further processed 114 so as to generate a variant call file 116,which variant call file may then be subjected to further processing suchas within the system 1, such as in response to a second and/or thirdparty analytics requests 121. More particularly, the system 1 may beconfigured to receive processing requests from a third party 121, andfurther be configured for performing such requested secondary 600 and/ortertiary processing 700 on the generated and/or acquired data.Specifically, the system 1 may be configured for producing and/oracquiring genetic sequence data 111, may be configured for taking thatgenetic sequence data and mapping 112, aligning 113, and/or sorting 114a it to produce one or more variant call files (VCFs) 116, andadditionally the system 1 may be configured for performing a tertiaryprocessing function 700 on the data, e.g., with respect to the one ormore VCFs.

The system 1 may be configured so as to perform any form of tertiaryprocessing 700 on the generated and/or acquired data, such as bysubjecting it to one or more pipeline processing functions 700 such asto generate genome data 122 a, epigenome data 122 b, metagenome data 122c, and the like, including joint genotyping 122 d, GATK 122 e and/orMuTect2 122 f analysis pipelines, among other potential data analyticpipelines. Further, the system 1 may be configured for performing anadditional tier of processing 800 on the generated and/or processeddata, such as including one or more of non-invasive prenatal testing(NIPT) 123 a, N/P ICU 123 b, cancer related diagnostics and/ortherapeutic modalities 123 c, various laboratory developed tests (LDT)123 d, agricultural biological (Ag Bio) applications 123 e, or othersuch health care related 123 f processing function.

Hence, in various embodiments, where a primary user may access and/orconfigure the system 1 and its various components directly, such asthrough direct access therewith, such as through the local computingresource 100, as presented herein, the system 1 may also be adapted forbeing accessed by a secondary party, such as is connected to the system1 via a local network or intranet connection 10 so as to configure andrun the system 1 within the local environment. Additionally, in certainembodiments, the system may be adapted for being accessed and/orconfigured by a third party 121, such as over an associated hybrid-cloudnetwork 50 connecting the third party 121 to the system 1, such asthrough an application program interface (API), accessible as throughone or more graphical user interface (GUI) components. Such a GUI may beconfigured to allow the third party user to access the system 1, andusing the API to configure the various components of the system, themodules, associated pipelines, and other associated data generatingand/or processing functionalities so as to run only those systemcomponents necessary and/or useful to the third party and/or requestedor desired to be run thereby.

Accordingly, in various instances, the system 1 as herein presented maybe adapted so as to be configurable by a primary, secondary, or tertiaryuser of the system. In such an instance, the system 1 may be adapted toallow the user to configure the system 1 and thereby to arrange itscomponents in such a manner as to deploy one, all, or a selection of theanalytical system resources, e.g., 152, to be run on data that is eithergenerated, acquired, or otherwise transferred to the system, e.g., bythe primary, secondary, or third party user, such that the system 1 runsonly those portions of the system necessary or useful for running theanalytics requested by the user to obtain the desired results thereof.For example, for these and other such purposes, an API may be includedwithin the system 1 wherein the API is configured so as to include orotherwise be operably associated with a graphical user interface (GUI)including an operable menu and/or a related list of system functioncalls from which the user can select and/or otherwise make so as toconfigure and operate the system and its components as desired.

In such an instance, the GUI menu and/or system function calls maydirect the user selectable operations of one or more of a first tier ofoperations 600 including: sequencing 111, mapping 112, aligning 113,sorting 114 a, variant calling 115, and/or other associated functions114 in accordance with the teachings herein, such as with relation tothe primary and/or secondary processing functions herein described.Further, where desired the GUI menu and/or system function calls maydirect the operations of one or more of a second tier of operations 700including: a genome pipeline 122 a, epigenome pipeline 122 b, metagenomepipeline 122 c, a joint genotyping pipeline 122 d, GATK 122 e and/orMuTect2 122 f analysis pipelines. Furthermore, where desired the GUImenu and system function calls may direct the user selectable operationsof one or more of a third tier of operations 800 including: non-invasiveprenatal testing (NIPT) 123 a, N/P ICU 123 b, cancer related diagnosticsand/or therapeutic modalities 123 c, various laboratory developed tests(LDT) 123 d, agricultural biological (Ag Bio) applications 123 e, orother such health care related 123 f processing functions.

Accordingly, the menu and system function calls may include one or moreprimary, secondary, and/or tertiary processing functions, so as to allowthe system and/or its component parts to be configured such as withrespect to performing one or more data analysis pipelines as selectedand configured by the user. In such an instance, the local computingresource 100 may be configured to correspond to and/or mirror the remotecomputing resource 300, and/or likewise the local storage resource 200may be configured to correspond and/or mirror the remote storageresource 400 so that the various components of the system may be runand/or the data generated thereby may be stored either locally orremotely in a seamless distributed manner as chosen by the use of thesystem 1. Additionally, in particular embodiments, the system 1 may bemade accessible to third parties, for running proprietary analysisprotocols 121 a on the generated and/or processed data, such as byrunning through an artificial intelligence interface designed to findcorrelations there between.

The system 1 may be configured so as to perform any form of tertiaryprocessing on the generated and/or acquired data. Hence, in variousembodiments, a primary, secondary, or tertiary user may access and/orconfigure any level of the system 1 and its various components eitherdirectly, such as through direct access with the computing resource 100,indirectly, such as via a local network connection 30, or over anassociated hybrid-cloud network 50 connecting the party to the system 1,such as through an appropriately configured API having the appropriatepermissions. In such an instance, the system components may be presentedas a menu, such as a GUI selectable menu, where the user can select fromall the various processing and storage options desired to be run on theuser presented data. Further, in various instances, the user may uploadtheir own system protocols so as to be adopted and run by the system soas to process various data in a manner designed and selected for by theuser. In such an instance, the GUI and associated API will allow theuser to access the system 1 and using the API add to and configure thevarious components of the system, the modules, associated pipelines, andother associated data generating and/or processing functionalities so asto run only those system components necessary and/or useful to the partyand/or requested or desired to be run thereby.

Where the above with respect to FIGS. 41A and 41B are directed to datageneration 110 such as local data generation 100, employing a localcomputing resource 140. As indicated above, and with respect to FIG.41C, one or more of the above demarcated modules, and their respectivefunctions and/or associated resources, may be configured for beingperformed remotely, such as by a remote computing resource 300, andfurther be adapted to be transmitted to the system 1, such as in aseamless transfer protocol over a global cloud based internet connection50, such as via a suitably configured data acquisition mechanism 120.

Accordingly, in such an instance, the local computing resource 100 mayinclude a data acquisition mechanism 120, such as configured fortransmitting and/or receiving such acquired data and/or associatedinformation. For instance, the system 1 may include a data acquisitionmechanism 120 that is configured in a manner so as to allow thecontinued processing and/or storage of data to take place in a seamlessand steady manner, such as over a cloud or hybrid based network 30/50where the processing functions are distributed both locally 100 and/orremotely 300, and likewise where one or more of the results of suchprocessing may be stored locally 200 and/or remotely 400, such that thesystem seamlessly allocates to which local or remote resource a givenjob is to be sent for processing and/or storage regardless of where theresource is physically positioned. Such distributed processing,transferring, and acquisition may include one or more of sequencing 111,mapping 112, aligning 113, sorting 114 a, duplicate marking 114 c,deduplication, recalibration 114 d, local realignment 114 e, BaseQuality Score Recalibration 114 f function(s) and/or a compressionfunction 114 g, as well as a variant call function 116, as hereindescribed. Where stored locally 200 or remotely 400, the processed data,in whatever state it is in in the process may be made available toeither the local 100 or remote processing 300 resources, such as forfurther processing prior to re-transmission and/or re-storage.

Specifically, the system 1 may be configured for producing and/oracquiring genetic sequence data 111, may be configured for taking thatgenetic sequence data and processing it locally 140, or transferring thedata over a suitably configured cloud 30 or hybrid cloud 50 network suchas to a remote processing facility for remote processing 300. Further,once processed the system 1 may be configured for storing the processeddata remotely 400 or transferring it back for local storage 200.Accordingly, the system 1 may be configured for either local or remotegeneration and/or processing of data, such as where the generationand/or processing steps may be from a first tier of primary and/orsecondary processing functions 600, which tier may include one or moreof: sequencing 111, mapping 112, aligning 113, and/or sorting 114 a soas to produce one or more variant call files (VCFs) 116.

Likewise, the system 1 may be configured for either local or remotegeneration and/or processing of data, such as where the generationand/or processing steps may be from a second tier of tertiary processingfunctions 700, which tier may include one or more of generating and/oracquiring data pursuant to a genome pipeline 122 a, epigenome pipeline122 b, metagenome pipeline 122 c, a joint genotyping pipeline 122 d,GATK 122 e and/or MuTect2 122 f analysis pipeline. Additionally, thesystem 1 may be configured for either local or remote generation and/orprocessing of data, such as where the generation and/or processing stepsmay be from a third tier of tertiary processing functions 800, whichtier may include one or more of generating and/or acquiring data relatedto and including: non-invasive prenatal testing (NIPT) 123 a, N/P ICU123 b, cancer related diagnostics and/or therapeutic modalities 123 c,various laboratory developed tests (LDT) 123 d, agricultural biological(Ag Bio) applications 123 e, or other such health care related 123 fprocessing functions.

In particular embodiments, as set forth in FIG. 41C, the system 1 mayfurther be configured for allowing one or more parties to access thesystem and transfer information to or from the associated localprocessing 100 and/or remote 300 processing resources as well as tostore information either locally 200 or remotely 400 in a manner thatallows the user to choose what information get processed and/or storedwhere on the system 1. In such an instance, a user can not only decidewhat primary, secondary, and/or tertiary processing functions getperformed on generated and/or acquired data, but also how thoseresources get deployed, and/or where the results of such processing getsstored. For instance, in one configuration, the user may select whetherdata is generated either locally or remotely, or a combination thereof,whether it is subjected to secondary processing, and if so, whichmodules of secondary processing it is subjected to, and/or whichresource runs which of those processes, and further may determinewhether the then generated or acquired data is further subjected totertiary processing, and if so, which modules and/or which tiers oftertiary processing it is subjected to, and/or which resource runs whichof those processes, and likewise, where the results of those processesare stored for each step of the operations.

Particularly, in one embodiment, the user may configure the system 1 ofFIG. 41A so that the generating of genetic sequence data 111 takes placeremotely, such as by an NGS, but the secondary processing 600 of thedata occurs locally 100. In such an instance, the user can thendetermine which of the secondary processing functions occur locally 100,such as by selecting the processing functions, such as mapping 112,aligning 113, sorting 111, and/or producing a VCF 116, from a menu ofavailable processing options. The user may then select whether thelocally processed data is subjected to tertiary processing, and if sowhich modules are activated so as to further process the data, andwhether such tertiary processing occurs locally 100 or remotely 300.Likewise, the user can select various options for the various tiers oftertiary processing options, and where any generated and/or acquireddata is to be stored, either locally 200 or remotely 400, at any givenstep or time of operation.

More particularly, a primary user may configure the system to receiveprocessing requests from a third party, where the third party mayconfigure the system for performing such requested primary, secondary,and/or tertiary processing on generated and/or acquired data.Specifically, the user or second and/or third party may configure thesystem 1 for producing and/or acquiring genetic sequence data, eitherlocally 100 or remotely 200. Additionally, the user may configure thesystem 1 for taking that genetic sequence data and mapping, aligning,and/or sorting it, either locally or remotely, so as to produce one ormore variant call files (VCFs). Additionally, the user may configure thesystem for performing a tertiary processing function on the data, e.g.,with respect to the one or more VCFs, either locally or remotely.

More particular still, the user or other party may configure the system1 so as to perform any form of tertiary processing on the generatedand/or acquired data, and where that processing is to occur in thesystem. Hence, in various embodiments, the first, second, and/or thirdparty 121 user may access and/or configure the system 1 and its variouscomponents directly such as by directly accessing the local computingfunction 100, via a local network connection 30, or over an associatedhybrid-cloud network 50 connecting the party 121 to the system 1, suchas through an application program interface (API), accessible as throughone or more graphical user interface (GUI) components. In such aninstance, the third party user may access the system 1 and use the APIto configure the various components of the system, the modules,associated pipelines, and other associated data generating and/orprocessing functionalities so as to run only those system componentsnecessary and/or useful to the third party and/or requested or desiredto be run thereby, and further allocate which computing resources willprovide the requested processing, and where the results data will bestored.

Accordingly, in various instances, the system 1 may be configurable by aprimary, secondary, or tertiary user of the system who can configure thesystem 1 so as to arrange its components in such a manner as to deployone, all, or a selection of the analytical system resources to be run ondata that the user either directly generates, causes to be generated bythe system 1, or causes to be transferred to the system 1, such as overa network associated therewith, such as via the data acquisitionmechanism 120. In such a manner, the system 1 is configurable so as toonly run those portions of the system necessary or useful for theanalytics desired and/or requested by the requesting party. For example,for these and other such purposes, an API may be included wherein theAPI is configured so as to include a GUI operable menu and/or a relatedlist of system function calls that from which the user can select so asto configure and operate the system as desired.

Additionally, in particular embodiments, the system 1 may be madeaccessible to third parties, such as governmental regulators, such asthe Federal Drug Administration (FDA) 70 b, or allow third parties tocollate, compile, and/or access a data base of genetic informationderived or otherwise acquired and/or compiled by the system 1 so as toform an electronic medical records (EMR) database 70 a and/or to allowgovernmental access and/or oversight of the system, such as the FDA forDrug Development Evaluation. The system 1 may also be set up toconglomerate, compile, and/or annotate the data 70 c and/or allow otherhigh level users access thereto.

Accordingly, in various embodiments, as can be seen with respect to FIG.42A, a hybrid cloud 50 is provided wherein the hybrid cloud isconfigured for connecting a local computing 100 and/or storage resource200 with a remote computing 300 and/or storage 400 resource, such aswhere the local and remote resources are separated one from the otherdistally, spatially, geographically, and the like. In such an instance,the local and distal resources may be configured for communicating withone another in a manner so as to share information, such as digitaldata, seamlessly between the two. Particularly, the local resources maybe configured for performing one or more types of processing on thedata, such as prior to transmission across the hybrid network 50, andthe remote resources may be configured for performing one or more typesof further processing of the data.

For instance, in one particular configuration, the system 1 may beconfigured such that a generating and/or analyzing function 152 isconfigured for being performed locally 100 by a local computingresource, such as for the purpose of performing a primary and/orsecondary processing function, so as to generate and/or process geneticsequence data, as herein described. Additionally, in variousembodiments, the local resources may be configured for performing one ormore tertiary processing functions on the data, such as one or more ofgenome, exome, and/or epigenome analysis, or a cancer, microbiome,and/or other DNA/RNA processing analysis. Further, where such processeddata is meant to be transferred, such as to a remote computing 300and/or storage 400 resource, the data may be transformed such as by asuitably configured transformer 151, which transformer 151 may beconfigured for indexing, converting, compressing, and/or encrypting thedata, such as prior to transfer over the hybrid network 50.

In particular instances, such as where the generated and processed datais transferred to a remote computing resource 300 for furtherprocessing, such processing may be of a global nature and may includereceiving data from a plurality of local computing resources 100,collating such pluralities of data, annotating the data, and comparingthe same, such as to interpret the data, determine trends thereof,analyzing the same for various biomarkers, and aiding in the developmentof diagnostics, therapeutics, and/or prophylactics. Accordingly, invarious instances, the remote computing resource 300 may be configuredas a data processing hub, such as where data from a variety of sourcesmay be transferred, processed, and/or stored while waiting to betransformed and/or transferred, such as by being accessed by the localcomputing resource 100. More particularly, the remote processing hub 300may be configured for receiving data from a plurality of resources 100,processing the same, and distributing the processed data back to thevariety of local resources 100 so as to allow for collaboration amongstresearchers and/or resources 100. Such collaboration may include variousdata sharing protocols, and may additionally include preparing the datato be transferred, such as by allowing a user of the system 1 to selectamongst various security protocols and/or privacy settings so as tocontrol how the data will be prepared for transfer.

In one particular instance, as presented in FIG. 42B, a local computing100 and/or storage 200 resource is provided, such as on-site at a user'slocation. The computing resource 100 and/or storage 200 resource may becoupled to a data generating resource 121, such as an NGS or sequenceron a chip, as herein described, such as over a direct or an intranetconnection 10, where the sequencer 121 is configured for generatinggenetic sequencing data, such as BCL and/or FASTQ files. For instance,the sequencer 121 may be part of and/or housed in the same apparatus asthat of the computing resource 100 and/or storage unit 200, so as tohave a direct communicable and/or operable connection therewith, or thesequencer 121 and computing resource 100 and/or storage resource 200 maybe part of separate apparatuses from one another, but housed in the samefacility, and thus connected over a cabled or intranet 10 connection. Insome instances, the sequencer 121 may be housed in a separate facilitythan that of the computing 100 and/or storage 200 resource and thus maybe connected over an internet 30 or hybrid cloud connection 50.

In such instances, the genetic sequence data may be processed 100 andstored locally 200, prior to being transformed, by a suitably configuredtransformer 151, or the generated sequence data may be transmitteddirectly to one or more of the transformer 151 and/or analyzer 152, suchas over a suitably configured local connection 10, intranet 30, orhybrid cloud connection 50, as described above such as prior to beingprocessed locally. Particularly, like the data generating resource 121,the transformer 151 and/or analyzer 152 may be part of and/or housed inthe same apparatus as that of the computing resource 100 and/or storageunit 200, so as to have a direct communicable and/or operable connectiontherewith, or the transformer 151 and/or analyzer 152 and computingresource 100 and/or storage resource 200 may be part of separateapparatuses from one another, but housed in the same facility, and thusconnected over a cabled or intranet 10 connection. In some instances,the transformer 151 and/or analyzer 152 may be housed in a separatefacility than that of the computing 100 and/or storage 200 resource andthus may be connected over an internet 30 or hybrid cloud connection 50.

For instance, the transformer 151 may be configured for preparing thedata to be transmitted either prior to analysis or post analysis, suchas by a suitably configured computing resource 100 and/or analyzer 152.For instance, the analyzer 152 may perform a secondary and/or tertiaryprocessing function on the data, as herein described, such as foranalyzing the generated sequence data with respect to determining itsgenomic and/or exomic characteristics 152 a, its epigenomic features 152b, any various DNA and/or RNA markers of interests and/or indicators ofcancer 152 c, and its relationships to one or more microbiomes 152 d, aswell as one or more other secondary and/or tertiary processes asdescribed herein.

As indicated, the generated and/or processed data may be transformed,such as by a suitably configured transformer 151 such as prior totransmission throughout the system 1 from one component thereof toanother, such as over a direct, local 10, internet 30, or hybrid cloud50 connection. Such transformation may include one or more of conversion151 d, such as where the data is converted from one form to another;comprehension 151 c, including the coding, decoding, and/or otherwisetaking data from an incomprehensible form and transforming it to acomprehensible form, or from one comprehensible form to another;indexing 151 b, such as including compiling and/or collating thegenerated data from one or more resources, and making it locatableand/or searchable, such as via a generated index; and/or encryption 151a, such as creating a lockable and unlockable, password protecteddataset, such as prior to transmission over an internet 30 and/or hybridcloud 50.

Hence, as can be seen with respect to FIG. 42C, in these and/other suchinstances, the hybrid cloud 50 may be configured for allowing seamlessand protected transmission of data throughout the components of thesystem, such as where the hybrid cloud 50 is adapted to allow thevarious users of the system to configure its component parts and/or thesystem itself so as to meet the research, diagnostic, therapeutic and/orprophylactic discovery and/or development needs of the user.Particularly, the hybrid cloud 50 and/or the various components of thesystem 1 may be operably connected with compatible and/or correspondingAPI interfaces that are adapted to allow a user to remotely configurethe various components of the system 1 so as to deploy the resourcesdesired in the manner desired, and further to do so either locally,remotely, or a combination of the same, such as based on the demands ofthe system and the particulars of the analyses being performed, all thewhile being enabled to communicate in a secured, encryptableenvironment.

In particular instances, the system 1 may include a processingarchitecture 310, such as an interpreter, that is configured forperforming an interpreting function 310. The interpreter 310 may performone or a series of analytic functions on generated data, such asannotation 311, interpretation 312, diagnostics 313, and/or a detectionand/or an analysis function for determining the presence of one or morebiomarkers, such as in the genetic data. The interpreter 313 may be partof or separate from the local computing resource 100, such as where theinterpreter 310 is coupled to the computing resource 100 via a cloudinterface, such as a hybrid cloud 50.

Further an additional processing architecture 320 may be included, suchas where the architecture 320 is configured as a collaborator. Thecollaborator 320 may be configured for performing one or more functionsdirected to ensuring the security and/or privacy of data to betransmitted. For instance, the collaborator may be configured forsecuring the data sharing process 321, for ensuring the privacy oftransmission 322, setting control parameters 323, and/or for initiatinga security protocol 324. The collaborator 313 is configured for allowingfor the sharing of data, such as for facilitating the collaboration ofprocessing, as such the collaborator 320 may be part of or separate fromthe local computing resource 100, such as where the collaborator 320 iscoupled to the computing resource 100 via a cloud interface, such as ahybrid cloud 50. The interpreter 310, collaborator 320, and/or the localcomputing resource 100 may further be coupled to a remote computingresource 300, such as for enhancing system efficiency by offloadingcomputing 300 and/or storage 400 functions into the cloud 50. In variousinstance, the system 1 may be configured for allowing secure third partyanalysis 121 to take place, such as where the third party can connectwith and engage the system such as through a suitably configured API.

As can be seen with respect to FIG. 43, the system 1 may be amulti-tiered and/or multiplexed bioanalytical processing platform thatincludes layers of data generating and/or data processing units eachhaving one or more processing pipelines that may be deployed in asystematic and concurrent or sequential manner so as to process geneticinformation from its primary processing stage to a secondary and/ortertiary processing stage. Particularly, presented herein are devicesconfigured for performing bioanalysis in one or more of hardware and/orsoftware implementations, as well as methods of their use, and systemsincluding the same. For instance, in one embodiment, a genomicsprocessing platform may be provided and configured as a multiplicity ofintegrated circuits, which integrated circuits may be adapted as, orotherwise be included within, one or more of a central or graphicsprocessing unit, such as a general purpose CPU and/or GPU, a hardwiredimplementation, and/or a quantum processing unit. Particularly, invarious embodiments, one or more pipelines of the genomics processingplatform may be configured by one or more quantum circuits of a quantumprocessing unit.

Accordingly, the platforms herein presented may be configured so as toharnesses the tremendous power of optimized software and/or hardwareand/or quantum processing implementations for the performance of thevarious genetic sequencing and/or secondary processing functions, hereindisclosed, which may be run on one or more integrated circuits. Suchintegrated circuits may be seamlessly coupled together and may furtherbe seamlessly coupled to various other integrated circuits, e.g., CPUsand/or GPUs and/or QPUs, of the system that are configured for runningthe various software and/or hardwired based applications of tertiarybioanlytical functions.

Particularly, in various embodiments, these processes may be performedby optimized software run on a CPU, GPU and/or QPU, and/or may beimplemented as a firmware configured integrated circuit, which may bepart of the same device or separate devices that may be positioned onthe same motherboard, different PCIe cards within the same device,separate devices in the same facility, and/or located at differentfacilities. Accordingly, the one or more integrated circuits may bedirectly coupled together, such as by being physically incorporated intothe same mother board, or separate mother boards positioned within thesame housing and/or otherwise coupled together, or they may bepositioned on separate motherboards or pCIE cards that are capable ofcommunicating with one another remotely, such as wirelessly and/or via anetworked interface, such as via the cloud. In particular instances, theintegrated circuit(s) forming or being a part of the CPU, GPU, and/orQPU which integrated circuit(s) may be arranged as and/or be a part ofthe secondary and/or tertiary analytics platform may be configured so asto form a pipeline of analyses where the various data generated may befed into and out of, back and forth, between the various integratedcircuits, such as in a seamless and/or streaming fashion, such as toexpedite the analyses herein.

For instance, in some instances, the various devices for use inaccordance with the methods disclosed herein may include, or otherwisebe associated with, one or more sequencing devices, for performing asequencing protocol, which sequencing protocol may be performed bysoftware run on a remote sequencer, such as by a Next Gen sequencer,e.g., IIlumina's HiSeq Ten, located in a core sequencing facility, suchas made accessible via a cloud based interface. In other instances, thesequencing may be performed in a hardwired configuration run on asequencing chip, such as implemented by Thermo Fisher's Ion Torrent orother sequencer a chip technologies, where sequencing is performed byuse of a semiconductor technology that delivers benchtop next gensequencing, and/or by an integrated circuit configured as, or tootherwise include, a field effect transistor employing a graphenechannel layer. In such instances, where the sequencing is performed byone or more integrated circuits configured as, or to include, asemiconducting sequencing microchip, the chip(s) may be positionedremotely from the one or more other integrated circuits disclosed hereinand configured for performing secondary and/or tertiary analytics on thesequenced data, or they may be positioned relatively close to oneanother so as to be directly coupled together or at least within thesame general proximity of one another, such as within the same facility.In such instances, a sequencing and/or BiolT analytics pipeline may beformed such that the raw sequencing data generated by the sequencer maybe rapidly communicated to the other analytic components of the pipelinefor direct analysis, such as in a streaming manner.

Further, once the raw sequencing or read data is produced by thesequencing instrument, this data may be transmitted to and be receivedby an integrated circuit configured for performing various bioanalyticfunctions on genetic and/or protein sequences, such as with respect toanalyzing the generated and/or received DNA, RNA, and/or proteinsequence data. This sequence analysis may involve the comparing of agenerated or received nucleic acid or protein sequence to one or moredatabases of known sequences, such as for performing secondary analysison the received data, and/or in some instances, for performing diseasediagnostics, such as where the database of known sequences forperforming the comparison may be a database containing morphologicallydistinct and/or abhorrent sequence data, that is data of genetic samplespertaining to or believed to pertain to one or more diseased states.

Accordingly, in various instances, once isolated and sequenced, thegenetic data may be subjected to secondary analysis, which may beperformed on the received data, such as for the performance of mapping,aligning, sorting, variant calling, and/or the like, so as to generatemapped and/or aligned data that may then be used to derive one or moreVCF detailing the difference between the mapped and/or aligned geneticsequence and a reference sequence. Particularly, once secondaryprocessing has occurred, the genetic information may then be passed ontoone or more tertiary processing modules of the system, such as forfurther processing thereby, such as to derive therapeutically and/orprophylactic results. More particularly, after variant calling, themapper/aligner/variant caller may output a standard VCF file that isready for and may be communicated to an additional integrated circuitfor performing tertiary analysis, such as analyses related to wholegenome analysis pipeline, genotyping analysis, micro-array analysis,exome analysis, microbiome analysis, an epigenome analysis, a metagenomeanalysis, a joint genotyping analysis, a variance analysis, e.g., a GATKanalysis, structural variants analysis, somatic variants analysis, andthe like, as well as an RNA-sequencing or other genomics analysis.

Hence, the bioanalytic, e.g., the BiolT, platform herein presented mayinclude highly optimized algorithms for mapping, aligning, sorting,duplicate marking, haplotype variant calling, compression and/ordecompression, such as in a software, hardwired, and/or a quantumprocessing configuration. For example, although one or more of thesefunctions may be configured to be performed entirely or partially in ahardwired configuration, in particular instances, the tertiaryprocessing platform may be configured for running one or more softwareand/or quantum processing applications, such as one or more programsdirected at performing one or more bioanalytics functions, such as oneor more of the functions disclosed herein below. Particularly, thesequenced and/or mapped and/or aligned and/or other processed data maythen be further processed by one or more other highly optimizedalgorithms for one or more of whole genome analysis, genotypinganalysis, micro-array analysis, exome analysis, microbiome analysis,epigenome analysis, metagenome analysis, joint genotyping, and/or avariant, e.g., GATK analysis, such as implemented by software being runon a general purpose CPU and/or GPU and/or QPU.

Accordingly, as can be seen with reference to FIG. 43, in variousembodiments, the multiplexed bioanalytical processing platforms areconfigured for performing one or more of primary, secondary, and/ortertiary processing. For example, the primary processing stage producesgenetic sequence data, such as in one or more BCL and/or FASTQ files fortransfer into the system 1. Once within the system 1 the sequencedgenetic data, including any associated metadata, may be advanced to asecondary processing stage 600, so as to produce one or more variantcall files. Hence, the system may also be configured to take the one ormore variant call files along with any associated metadata, and/or orother associated processed data, and in one or more tertiary processingstages, may perform one or more other operations thereon, such as forthe purposes of performing one or more diagnostics and/or prophylacticand/or therapeutic procedures there with.

Particularly, an analysis of the data may be initiated, e.g., inresponse to a third-party request 121, and/or in response to datasubmitted by the third party 121, and/or data automatically retrievedfrom a local 200 and/or remote 400 storage facility. Such furtherprocessing may include a first tier of processing wherein variouspipeline run protocols 700 are configured to perform analytics on thedetermined genetic, e.g., variation, data of one or more subjects. Forinstance, a first tier of tertiary processing units may include agenomics processing platform that is configured to perform genome,epigenome, metagenome, genotyping, and/or various variant analysis,and/or other bioinformatics based analysis. Additionally, in a secondtertiary processing tier, various disease diagnostic, research, and/oranalysis protocols 800 may be performed, which analysis may include oneor more of NIPT, NICU, cancer, LDT, biological, AgBio applications andthe like.

The system 1 may further be adapted so as to receive and/or transmitvarious data 900 related to the procedures and processes hereindisclosed such as related to electronic medical records (EMR) data,Federal Drug Administration testing and/or structuring data, datarelevant to annotation, and the like. Such data may be useful so as toallow a user to make and/or allow access to generated medical,diagnostic, therapeutic, and/or prophylactic modalities developedthrough use of the system 1 and/or made accessible thereby. Accordingly,in various instances, the devices, methods, and systems presented hereinallow for the secure performance of genetic and bioanalytic analysis, aswell as for the secure transfer of the results thereof, in a forum thatmay be easily usable for downstream processing.

Particularly, the first tertiary processing tier 700 may include one ormore genomics processing platforms, such as for performing geneticsanalysis, such as on mapped and/or aligned data, e.g., in a SAM or BAMfile format, and/or for processing variant data, such as in a VCFformat. For instance, the first tertiary processing platform may includeone or more of a genome pipeline, epigenome pipeline, a metagenomepipeline, a joint genotyping pipeline, as well as one/or more variantanalysis pipelines, including: a GATK pipeline, structural variantpipeline, somatic variant calling pipeline, and in some instances, mayinclude an RNA-sequencing analysis pipeline. One or more other genomicanalysis pipelines may also be included.

More specifically, with reference to FIG. 43, in various instances, themulti-tiered and/or multiplexed bioanalytical processing platformincludes a further layer of data generation and/or processing units. Forinstance, in certain instances, the bioanalytical processing platformincorporates one or more processing pipelines, in one or more ofsoftware and/or hardware implementations, that are directed toperforming one or more tertiary processing protocols. For example, inparticular instances, a platform of tertiary processing pipelines 700may include one or more of a genome pipeline, an epigenome pipeline, ametagenome pipeline, a joint genotyping pipeline, a variance pipeline,such as a GATK pipeline, and/or other pipelines, such as an RNApipeline.

It is to be noted that with respect to FIGS. 40 and 43, one or more,e.g., all, of these functions therefore may be performed locally, e.g.,on site 10, on the cloud 30, or via controlled access through the hybridcloud 50. In such an instance, a developer environment is created thatallows a user to control the functionality of the system 1 to meet hisor her individual needs and/or to allow access thereto for othersseeking the same or similar results. Consequently, the variouscomponents, processes, procedures, tools, tiers, and hierarchies of thesystem may be configurable such as via a GUI interface that allows theuser to select which components of the system to be run, on which data,at what time, and in what order in accordance with the user determineddesires and protocols, so as to generate relevant data and connectionsbetween data that may be securely communicated throughout the systemwhether locally or remotely. As indicated, these components can be madeto communicate seamlessly together, e.g., regardless of location and/orhow connected, such as by being in a tightly coupled configurationand/or a seamless cloud based coupling, and/or by being configurable,e.g., via a JIT protocol, so as to run the same or similar processes inthe same or similar manner, such as by employing corresponding APIinterfaces dispersed throughout the system, the employment of whichallows the various users to configure the various components to run thevarious procedures in like manner.

For instance, an API may be defined in a header file with respect to theprocesses to be run by each particular component of the system 1,wherein the header describes the functionality and determines how tocall a function, such as the parameters that are passed, the inputsreceived and outputs transmitted, and the manner in which this occurs,what comes in and how, what goes out and how, and what gets returned,and in what manner. For example, in various embodiments, one or more ofthe components and/or elements thereof, which may form one or morepipelines of one or more tiers of the system may be configurable such asby instructions entered by a user and/or one or more second and/or thirdparty applications. These instructions may be communicated to the systemvia the corresponding APIs which communicate with one or more of thevarious drivers of the system, instructing the driver(s) as to whichparts of the system, e.g., which modules and/or which processes thereofare to be activated, when, and in what order, given a preselectedparameter configuration, which may be determined by a user selectableinterface, e.g., GUI.

Particularly, the one or more DMA drivers of the system 1 may beconfigured to run in corresponding fashion, such as at the kernel levelof each component and the system 1 as a whole. In such an instance, oneor more of the provided kernel's may have their own very low level,basic API that provides access to the hardware and functions of thevarious components of the system 1 so as to access applicable registersand modules so as to configure and direct the processes and the mannersin which they are run on the system 1. Particularly, on top of thislayer, a virtual layer of service functions may be built so as to formthe building blocks that are used for a multiplicity of functions thatsend files down to the kernel(s) and get results back, encodes,encrypts, and/or transmits the relevant data and further performs morehigher level functions thereon. On top of that layer an additional layermay be built that uses those service functions, which may be an APIlevel that a user may interface with, which may be adapted to functionprimarily for configuration of the system 1 as a whole or its componentparts, downloading files, and uploading results, which files and/orresults may be transmitted throughout the system either locally orglobally.

Such configuration may include communicating with registers and alsoperforming function calls. For example, as described herein above, oneor more function calls necessary and/or useful to perform the steps,e.g., sequentially, to execute a mapping and/or aligning and/or sortingand/or variant call, or other secondary and/or tertiary function asherein described may be implemented in accordance with the hardwareoperations and/or related algorithms so as to generate the necessaryprocesses and perform the required steps.

Specifically, because in certain embodiments one or more of theseoperations may be based on one or more structures, the variousstructures needed for implementing these operations may need to beconstructed. There will therefore be a function call that performs thisfunction, which function call will cause the requisite structure to bebuilt for the performance of the operation, and because of this a callwill accept a file name of where the structure parameter files arestored and will then generate one or more data files that contain and/orconfigure the requisite structure. Another function call may be to loadthe structure that was generated via the respective algorithm andtransfer that down to the memory on the chip and/or system 1, and/or putit at the right spot where the hardware is expecting them to be. Ofcourse, various data will need to be downloaded onto the chip and/orotherwise be transferred to the system generator, as well for theperformance of the various other selected functions of the system 1, andthe configuration manager can perform these functions, such as byloading everything that needs to be there in order for the modules ofpipelines of the tiers of the platforms of the chip and/or system as awhole to perform their functions, into a memory on, attached, orotherwise associated with the chip and/or system.

Additionally, the API may be configured to allow one or more chips ofthe system 1 to interface with the circuit board of the sequencer 121,the computing resource 100/300, transformer 151, analyzer 152,interpreter 310, collaborator 320, or other system component, whenincluded therewith, so as to receive the FASTQ and/or other generatedand/or processed genetic sequencing files directly from the sequencer orother processing component such as immediately once they have beengenerated and/or processed and then transfers that information to theconfiguration manager which then directs that information to theappropriate memory banks in the hardware and/or software that makes thatinformation available to the pertinent modules of the hardware,software, and/or system as a whole so that they can perform theirdesignated functions on that information so as to call bases, map,align, sort, etc. the sample DNA/RNA with respect to the referencegenome, and or to run associated secondary and/or tertiary processingoperations thereon.

Accordingly, in various embodiments, a client level interface (CLI) maybe included wherein the CLI may allow the user to call one or more ofthese functions directly. In various embodiments, the CLI may be asoftware application, e.g., having a GUI, that is adapted to configurethe accessibility and/or use of the hardware and/or various othersoftware applications of the system. The CLI, therefore, may be aprogram that accepts instructions, e.g., arguments, and makesfunctionality available simply by calling an application program. Asindicated above, the CLI can be command line based or GUI (graphicaluser interface) based. The line based commands happen at a level belowthe GUI, where the GUI includes a windows based file manager with clickon function boxes that delineate which modules, which pipelines, whichtiers, of which platforms will be used and the parameters of their use.For example, in operation, if instructed, the CLI will locate thereference, will determine if a hash table and/or index needs to begenerated, or if already generated locate where it is stored, and directthe uploading of the generated hash table and/or index, etc. These typesof instructions may appear as user options at the GUI that the user canselect the associated chip(s)/system 1 to perform.

Furthermore, a library may be included wherein the library may includepre-existing, editable, configuration files, such as files orientated tothe typical user selected functioning of the hardware and/or associatedsoftware, such as with respect to a portion or whole genome and/orprotein analysis, for instance, for various analyses, such as personalmedical histories and ancestry analysis, or disease diagnostics, or drugdiscovery, therapeutics, and/or one or more of the other analytics, etc.These types of parameters may be preset, such as for performing suchanalyses, and may be stored in the library. For example, if the platformherein described is employed such as for NIPT, NICU, Cancer, LDT, AgBio,and related research on a collective level, the preset parameters may beconfigured differently than if the platform were directed simply toresearching genomic and/or genealogy based research, such as on anindividual level.

More particularly, for specific diagnosis of an individual, accuracy maybe an important factor, therefore, the parameters of the system may beset to ensure increased accuracy albeit in exchange for possibly adecrease in speed. However, for other genomics applications, speed maybe the key determinant and therefore the parameters of the system may beset to maximize speed, which however may sacrifice some accuracy.Accordingly, in various embodiments, often used parameter settings forperforming different tasks can be preset into the library to facilitateease of use. Such parameter settings may also include the necessarysoftware applications and/or hardware configurations employed in runningthe system 1. For instance, the library may contain the code thatexecutes the API, and may further include sample files, scripts, and anyother ancillary information necessary for running the system 1. Hence,the library may be configured for compiling software for running the APIas well as various of the executables.

Additionally, as can be seen with respect to FIGS. 42C and 43, thesystem may be configured such that one or more of the system componentsmay be performed remotely, such as where the system component is adaptedto run one or more comparative functions on the data, such as aninterpretive function 310 and/or collaborative function 320. Forinstance, where an interpretive protocol is employed on the data, theinterpretive protocol 312 may be configured to analyze and drawconclusions about the data and/or determine various relationships withrespect thereto, one or more other analytical protocols may also beperformed and include annotating the data 311, performing a diagnostic313 on the data, and/or analyzes the data, so as to determine thepresence or absence of one or more biomarkers 314.

Additionally, where a collaborative protocol is performed, the system 1may be configured for providing an electronic forum where data sharing321 may occur, which data sharing protocol may include user selectablesecurity 324 and/or privacy 322 settings that allow the data to beencrypted and/or password protected, so that the identity and sources ofthe data may be hidden from a user of the system 1. In particularinstances, the system 1 may be configured so as to allow a 3^(rd) partyanalyzer 121 to run virtual simulations on the data. Further, onegenerated, the interpreted data and/or the data subjected to one or morecollaborative analyses may be stored either remotely 400 or locally 200so as to be made available to the remote 300 or local 100 computingresources, such as for further processing and/or analysis.

In another aspect, as can be seen with respect to FIG. 44, a method forusing the system to generate one or more data files upon which one ormore secondary and/or tertiary processing protocols may be run isprovided. For instance, the method may include providing a genomicinfrastructure such as for one or more of onsite, cloud-based, and/orhybrid genomic and/or bioinformatics generation and/or processing and/oranalysis.

In such an instance, the genomic infrastructure may include abioinformatics processing platform having one or more memories that areconfigured to store one or more configurable processing structures forconfiguring the system so as to be able to perform one or moreanalytical processing functions on data, such as data including agenomic sequence of interest or processed result data pertainingthereto. The memory may include the genomic sequence of interest to beprocessed, e.g., once generated and/or acquired, one or more geneticreference sequences, and/or may additionally include an index of the oneor more genetic reference sequences and/or a list of splice junctionspertaining thereto. The system may also include an input having aplatform application programming interface (API) for selecting from alist of options one or more of the configurable processing structures,such as for configuring the system, such as by selecting whichprocessing functions of the system will be run on the data, e.g., thepre- or processed genomic sequences of interest. A graphical userinterface (GUI) may also be present, such as operably associated withthe API, so as to present a menu by which a user can select which of theavailable options he or she desires to be run on the data.

The system may be implemented on one or more integrated circuits thatmay be formed of one or more sets of configurable, e.g., preconfiguredand/or hardwired, digital logic circuits that may be interconnected by aplurality of physical electrical interconnects. In such an instance, theintegrated circuit may have an input, such as a memory interface, forreceiving one or a plurality of the configurable structure protocols,e.g., from the memory, and may further be adapted for implementing theone or more structures on the integrated circuit in accordance with theconfigurable processing structure protocols. The memory interface of theinput may also be configured for receiving the genomic sequence data,which may be in the form of a plurality of reads of genomic data. Theinterface may also be adapted for accessing the one or more geneticreference sequences and the index(es).

In various instances, the digital logic circuits may be arranged as aset of processing engines that are each formed of a subset of thedigital logic circuits. The digital logic circuits and/or processingengines may be configured so as to perform one or more pre-configurablesteps of a primary, secondary, and/or tertiary processing protocol so asto generate the plurality of reads of genomic sequence data, and/or forprocessing the plurality of reads of genomic data, such as according tothe genetic reference sequence(s) or other genetic sequence derivedinformation. The integrated circuit may further have an output so as tooutput result data from the primary, secondary, and/or tertiaryprocessing, such as according to the platform application programminginterface (API).

Particularly, in various embodiments, the digital logic circuits and/orthe sets of processing engines may form a plurality of genomicprocessing pipelines, such as where each pipeline may have an input thatis defined according to the platform application programming interfaceso as to receive the result data from the primary and/or secondaryprocessing by the bioinformatics processing platform, and for performingone or more analytic processes thereon so as to produce result data.Additionally, the plurality of genomic processing pipelines may have acommon pipeline API that defines a secondary and/or tertiary processingoperation to be run on the result data from the primary and/or secondaryprocessed data, such as where each of the plurality of genomicprocessing pipelines is configured to perform a subset of the secondaryand/or tertiary processing operations and to output result data of thesecondary and/or tertiary processing according to the pipeline API.

In such instances, a plurality of the genomic analysis applications maybe stored in the memory and/or an associated searchable applicationrepository, such as where each of the plurality of genomic analysisapplications are accessible via an electronic medium by a computer suchas for execution by a computer processor, so as to perform a targetedanalysis of the genomic pre- or post processed data from the result dataof the primary, secondary, and/or tertiary processing, such as by one ormore of the plurality of genomic processing pipelines. In particularinstances, each of the plurality of genomic analysis applications may bedefined by the API and may be configured for receiving the result dataof the primary, secondary, and/or tertiary processing, and/or forperforming the target analysis of the pre- or post processed genomicdata, and for outputting the result data from the targeted analysis toone of one or more genomic databases.

The method may additionally include, selecting, e.g., from the menu ofthe GUI, one or more genomic processing pipelines from a plurality ofthe available genomic processing pipelines of the system; selecting oneor more genomic analysis applications from the plurality of genomicanalysis applications that are stored in an application repository; andexecuting, using a computer processor, the one or more selected genomicanalysis applications to perform a targeted analysis of genomic datafrom the result data of the primary, secondary, and/or tertiaryprocessing.

Additionally, in various embodiments, all of mapping, aligning, andsorting, may take place on the chip, and local realignment, duplicatemarking, base quality score recalibration may, and/or one or more of thetertiary processing protocols and/or pipelines, in various embodiments,also take place on the chip, and in various instances, variouscompression protocols, such as SAM and/or BAM and/or CRAM, may also takeplace on the chip. However, once the primary, secondary, and/or tertiaryprocessed data has been produced, it may be compressed, such as prior tobeing transmitted, such as by being sent across the system, being sentup to the cloud, such as for the performance of the variant callingmodule, a secondary, tertiary, and/or other processing platform, such asincluding an interpretive and/or collaborative analysis protocol. Thismight be useful especially given the fact that variant calling,including the tertiary processing thereof, can be a moving target, e.g.,there is not one standardized agreed upon algorithm that the industryuses.

Hence, different algorithms can be employed, such as by remote users, soas to achieve a different type of result, as desired, and as such havinga cloud based module for the performance of this function may be usefulfor allowing the flexibility to select which algorithm is useful at anyparticular given moment, and also as for serial and/or parallelprocessing. Accordingly, any one of the modules disclosed herein can beimplemented as either hardware, e.g., on the chip, or software, e.g., onthe cloud, but in certain embodiments, all of the modules may beconfigured so that their function may be performed on the chip, or allof the modules may be configured so that their function may be performedremotely, such as on the cloud, or there will be a mixture of moduleswherein some are positioned on one or more chips and some are positionedon the cloud. Further, as indicated, in various embodiments, the chip(s)itself may be configured so as to function in conjunction with, and insome embodiments, in immediate operation with a genetic sequencer, suchas an NGS and/or sequencer on a chip.

More specifically, in various embodiments, an apparatus of thedisclosure may be a chip, such as a chip that is configured forprocessing genomics data, such as by employing a pipeline of dataanalysis modules. Accordingly, as can be seen with respect to FIG. 45, agenomics pipeline processor chip 100 is provided along with associatedhardware of a genomics pipeline processor system 10. The chip 100 hasone or more connections to external memory 102 (at “DDR3 MemController”), and a connection 104 (e.g., PCIe or QPI Interface) to theoutside world, such as a host computer 1000, for example. A crossbar 108(e.g., switch) provides access to the memory interfaces to variousrequestors. DMA engines 110 transfer data at high speeds between thehost and the processor chip's 100 external memories 102 (via thecrossbar 108), and/or between the host and a central controller 112. Thecentral controller 112 controls chip operations, especially coordinatingthe efforts of multiple processing engines 13. The processing enginesare formed of a set of hardwired digital logic circuits that areinterconnected by physical electrical interconnects, and are organizedinto engine clusters 11/114. In some implementations, the engines 13 inone cluster 11/114 share one crossbar port, via an arbiter 115. Thecentral controller 112 has connections to each of the engine clusters.Each engine cluster 11/114 has a number of processing engines 13 forprocessing genomic data, including a mapper 120 (or mapping module), analigner 122 (or aligning module), and a sorter 124 (or sorting module),one or more processing engines for the performance of other functions,such as variant calling, may also be provided. Hence, an engine cluster11/114 can include other engines or modules, such as a variant callermodule, as well.

In accordance with one data flow model consistent with implementationsdescribed herein, the host CPU/GPU 1000 sends commands and data via theDMA engines 110 to the central controller 112, which load-balances thedata to the processing engines 13. The processing engines returnprocessed data to the central controller 112, which streams it back tothe host via the DMA engines 110. This data flow model is suited formapping and alignment and variant calling. As indicated, in variousinstances, communication with the host CPU/GPU may be through arelatively loose or tight coupling, such as a low latency, highbandwidth interconnect, such as a QPI, such as to maintain cachecoherency between associated memory elements of the two or more devices.It is to be noted, in various instances, the host device may be aQuantum Processing Unit, such as for the sending of instructions anddata, as well as for the running or processes consistent with themethods disclosed herein.

For instance, in various instances, due to various power and/or spaceconstraints, such as when performing big data analytics, such asmapping/aligning/variant calling in a hybrid software/hardwareaccelerated environment, as described herein, where data needs to bemoved both rapidly and seamlessly between system devices, a cachecoherent tight coupling interface may be useful for performing such datatransmissions throughout the system to and from the coupled devices,such as to and from the sequencer, DSP (digital signal processor), CPUand/or GPU or CPU/GPU hybrid, accelerated integrated circuit, e.g.,FPGA, ASIC (on network card), a quantum processing unit, as well asother Smart Network Accelerators in a rapid, cache-coherent manner. Insuch instances, a suitable cache coherent, tight-coupling interconnectmay be one or more of a single interconnect technology specificationthat is configured to ensure that processing, such as between amultiplicity of processing platforms, using different instruction setarchitectures (ISA), can coherently share data between the differentplatforms and/or with one or more associated accelerators, e.g., such asa hardwired FPGA implemented accelerator, so as to enable efficientheterogeneous computing, and thereby significantly improve the computingefficiency of the system, which in various instances may be configuredas a cloud-based server system. Hence, in certain instances, a highbandwidth, low latency, cache coherent interconnect protocol, such as aQPI, Coherent Processor Accelerator Interface (CAPI), NVLink/GPU, orother suitable interconnect protocol may be employed so as to expeditevarious data transmissions between the various components of the system,such as pertaining to the mapping, aligning, and/or variant callingcompute functions that may involve the use of acceleration engines thefunctioning of which requires the need to access, process, and move dataseamlessly among various system components irrespective of where thevarious data to be processed resides in the system. And, where such datais retained within an associated memory device, such as a RAM or DRAM,the transmission activities may further involve expedited and coherentsearch and in-memory database processing.

Particularly, in particular embodiments, such heterogeneous computingmay involve a multiplicity of processing and/or accelerationarchitectures that may be interconnected in a reduced instruct setcomputing format. In such an instance, such an interconnect device maybe a coherent connect interconnect six (CCVI) device, which isconfigured to allow all computing componentry within the system toaddress, read, and/or write to one or more associated memories in asingle, consistent, and coherent manner. More particularly, a CCVIinterconnect may be employed so as to connect various of the devices ofthe system, such as the CPU and/or GPU, or CPU/GPU hybrid, FPGA/ASIC,QPU, and/or associated memories, etc. one with the other, such as in ahigh bandwidth manner that is configured to increase transfer ratesbetween the various components while evidencing extremely reducedlatency rates. Specifically, a CCVI interconnect may be employed andconfigured so as to allow components of the system to access and processdata irrespective of where the data resides, and without the need forcomplex programing environments that would otherwise need to beimplemented to make the data coherent. Other such interconnects that maybe employed so as to speed up, e.g., decrease, processing time andincrease accuracy include QPI, CAPI, NVLink, or other interconnect thatmay be configured to interconnect the various components of the systemand/or to ride on top of an associated PCI-express peripheralinterconnect.

Hence, in accordance with an alternative data flow model consistent withimplementations described herein, the host CPU/GPU/QPU 1000 streams datainto the external memory 1014, either directly via DMA engines 110 andthe crossbar 108, or via the central controller 112. The hostCPU/GPU/QPU 1000 sends commands to the central controller 112, whichsends commands to the processing engines 13, which instruct theprocessing engines as to what data to process. Because of the tightcoupling, the processing engines 13 access input data directly from theexternal memory 1014 or a cache associated therewith, process it, andwrite results back to the external memory 1014, such as over the tightlycoupled interconnect 3, reporting status to the central controller 112.The central controller 112 either streams the result data back to thehost 1000 from the external memory 1014, or notifies the host to fetchthe result data itself via the DMA engines 110.

FIG. 46 illustrates a genomics pipeline processor and system 20, showinga full complement of processing engines 13 inside an engine cluster11/214. The pipeline processor system 20 may include one or more engineclusters 11/214. In some implementations, the pipeline processor system20 includes four or more engine clusters 11/214. The processing engines13 or processing engine types can include, without limitation, a mapper,an aligner, a sorter, a local realigner, a base quality recalibrater, aduplicate marker, a variant caller, a compressor and/or a decompressor.In some implementations, each engine cluster 11/214 has one of eachprocessing engine type. Accordingly, all processing engines 13 of thesame type can access the crossbar 208 simultaneously, through differentcrossbar ports, because they are each in a different engine cluster11/214. Not every processing engine type needs to be formed in everyengine cluster 11/214. Processing engine types that require massiveparallel processing or memory bandwidth, such as the mapper (andattached aligner(s)) and sorter, may appear in every engine cluster ofthe pipeline processor system 20. Other engine types may appear in onlyone or some of the engine clusters 214, as needed to satisfy theirperformance requirements or the performance requirements of the pipelineprocessor system 20.

FIG. 47 illustrates a genomics pipeline processor system 30, showing, inaddition to the engine clusters 11 described above, one or more embeddedcentral processing units (CPUs) 302. Examples of such embedded CPUsinclude Snapdragons® or standard ARM® cores, or in other instances maybe an FPGA. These CPUs execute fully programmable bio-IT algorithms,such as advanced variant calling, such as the building of a DBG or theperformance of an HMM. Such processing is accelerated by computingfunctions in the various engine clusters 11, which can be called by theCPU cores 302 as needed. Furthermore, even engine-centric processing,such as mapping and alignment, can be managed by the CPU cores 302,giving them heightened programmability.

FIG. 48 illustrates a processing flow for a genomics pipeline processorsystem and method. In some preferred implementations, there are threepasses over the data. The first pass includes mapping 402 and alignment404, with the full set of reads streamed through the engines 13. Thesecond pass includes sorting 406, where one large block to be sorted(e.g., a substantial portion or all reads previously mapped to a singlechromosome) is loaded into memory, sorted by the processing engines, andreturned to the host. The third pass includes downstream stages (localrealignment 408, duplicate marking 410, base quality score recalibration(BQSR) 412, SAM output 414, reduced BAM output 416, and/or CRAMcompression 418). The steps and functions of the third pass may be donein any combination or subcombination, and in any order, in a singlepass. Hence, in this manner data is passed relatively seamlessly fromthe one or more processing engines, to the host CPU/GPU/QPU, such as inaccordance with one or more of the methodologies described herein.Hence, a virtual pipeline architecture, such as described above, is usedto stream reads from the host into circular buffers in memory, throughone processing engine after another in sequence, and back out to thehost. In some implementations, CRAM decompression can be a separatestreaming function. In some implementations, the SAM output 414, reducedBAM output 416, and/or CRAM compression 418 can be replaced with variantcalling, compression and decompression.

In various instances, a hardware implementation of a sequence analysispipeline is described. This can be done in a number of different wayssuch as an FPGA or ASIC or structured ASIC implementation. Thefunctional blocks that are implemented by the FPGA or ASIC or structuredASIC are set forth in FIG. 49. Accordingly, the system includes a numberof blocks or modules to do sequence analysis. The input to the hardwarerealization can be a FASTQ file, but is not limited to this format. Inaddition to the FASTQ file, the input to the FPGA or ASIC or structuredASIC consists of side information, such as Flow Space Information fromtechnology such as from the NGS. The blocks or modules may include thefollowing blocks: Error Control, Mapping, Alignment, Sorting, LocalRealignment, Duplicate Marking, Base Quality Recalibration, BAM and SideInformation reduction and/or variant calling.

With respect to FIG. 49, these blocks or modules can be present inside,or implemented by, the hardware, but some of these blocks may be omittedor other blocks added to achieve the purpose of realizing a sequenceanalysis pipeline. Blocks 2 and 3 describe two alternatives of thesequence analysis pipeline platform. The sequence analysis pipelineplatform comprising an FPGA or ASIC or structured ASIC and softwareassisted by a host (e.g., PC, server, cluster or cloud computing) withcloud and/or cluster storage. Blocks 4-7 describe different interfacesthat the sequence analysis pipeline can have. In Blocks 4 and 6 theinterface can be a PCIe and/or QPI/CAPI/CCVI/NVLink interface, but isnot limited to a PCIe, QPI, or other interface. In Blocks 5 and 7 thehardware (FPGA or ASIC or structured ASIC) can be directly integratedinto a sequencing machine. Blocks 8 and 9 describe the integration ofthe hardware sequence analysis pipeline integrated into a host systemsuch as a PC, server cluster or sequencer. Surrounding the hardware FPGAor ASIC or structured ASIC are a plurality of DDR3 memory elements and aPCIe/QPI/CAPI/CCVI/NVLink interface. The board with the FPGA/ASIC/sASICconnects to a host computer, consisting of a host CPU, GPU, and/or QPU,that could be either a low power CPU such as an ARM®, Snapdragon®, orany other processor. Block 10 illustrates a hardware sequence analysispipeline API that can be accessed by third party applications to performtertiary analysis.

FIGS. 50A and 50B depict an expansion card 104 having a processing chip100, e.g., an FPGA, of the disclosure, as well as one or more associatedelements 105 for coupling the FPGA 100 with the host CPU/GPU/QPU, suchas for the transferring of data, such as data to be processed and resultdata, back and forth from the CPU/GPU/QPU to the FPGA 100. FIG. 50Bdepicts the expansion card of FIG. 50A having a plurality, e.g., 3,slots containing a plurality, e.g., 3, processing chips of thedisclosure.

Specifically, as depicted in FIGS. 50A and 50B, in various embodiments,an apparatus of the disclosure may include a computing architecture,such as embedded in a silicon field gate programmable array (FPGA) orapplication specific integrated circuit (ASIC) 100. The FPGA 100 can beintegrated into a printed circuit board (PCB) 104, such as a PeripheralComponent Interface-Express (PCIe) card, which can be plugged into acomputing platform. In various instances, as shown in FIG. 50A, the PCIecard 104 may include a single FPGA 100, which FPGA may be surrounded bylocal memories 105, however, in various embodiments, as depicted in FIG.50B, the PCIe card 104 may include a plurality of FPGAs 100A, 100B and100C. In various instances, the PCI card may also include a PCIe bus.This PCIe card 104 can be added to a computing platform to executealgorithms on extremely large data sets. In an alternative embodiment,as noted above with respect to FIG. 34, in various embodiments, the FPGAmay be adapted so as to be directly associated with the CPU/GPU/QPU,such as via an interloper, and tightly coupled therewith, such as via aQPI, CAPI, CCVI interface. Accordingly, in various instances, theoverall work flow of genomic sequencing involving the FPGA may includethe following: Sample preparation, Alignment (including mapping andalignment), Variant analysis, Biological Interpretation, and/or SpecificApplications.

Hence, in various embodiments, an apparatus of the disclosure mayinclude a computing architecture that achieves the high performanceexecution of algorithms, such as mapping and alignment algorithms, thatoperate on extremely large data sets, such as where the data setsexhibit poor locality of reference (LOR). These algorithms are designedto reconstruct a whole genome from millions of short read sequences,from modern so-called next generation sequencers, require multi-gigabytedata structures that are randomly accessed. Once reconstruction isachieved, as described herein above, further algorithms with similarcharacteristics are used to compare one genome to libraries of others,do gene function analysis, etc.

There are three other typical architectures that in general may beconstructed for the performance of one or more of the operations hereindescribed in detail, such as including purpose multicore CPUs, generalpurpose Graphic Processing Units (GPGPUs), and/or a quantum processingunit. In such an instance, each CPU/GPU/QPU in a multicore system mayhave a classical cache based architecture, wherein instructions and dataare fetched from a level 1 cache (L1 cache) that is small but hasextremely fast access. Multiple L1 caches may be connected to a largerbut slower shared L2 cache. The L2 cache may be connected to a large butslower DRAM (Dynamic Random Access Memory) system memory, or may beconnected to an even larger but slower L3 cache which may then connectedto DRAM. An advantage of this arrangement may be that applications inwhich programs and data exhibit locality of reference behave nearly asif they are executing on a computer with a single memory as large as theDRAM but as fast as the L1 cache. Because full custom, highly optimizedCPUs operate at very high clock rates, e.g., 2 to 4 GHz, thisarchitecture may be essential to achieving good performance.Additionally, as discussed in detail with respect to FIG. 33, in variousembodiments the CPU/GPU may be tightly coupled to an FPGA, such as anFPGA configured for running one or more functions related to the variousoperations described herein, such as via a high bandwidth, low latencyinterconnect such as a QPI, CCVI, CAPI so as to further enhanceperformance as well as the speed and coherency of the data transferredthroughout the system. In such an instance, cache coherency may bemaintained between the two devices, as noted above.

Further, GPGPUs may be employed to extend this architecture, such as byimplementing very large numbers of small CPUs, each with their own smallL1 cache, wherein each CPU executes the same instructions on differentsubsets of the data. This is a so called SIMD (Single Instructionstream, Multiple Data stream) architecture. Economy may be gained bysharing the instruction fetch and decode logic across a large number ofCPUs. Each cache has access to multiple large external DRAMs via aninterconnection network. Assuming the computation to be performed ishighly parallelizable, GPGPUs have a significant advantage over generalpurpose CPUs due to having large numbers of computing resources.Nevertheless, they still have a caching architecture and theirperformance is hurt by applications that do not have a high enoughdegree of locality of reference. That leads to a high cache miss rateand processors that are idle while waiting for data to arrive from theexternal DRAM. Additionally, it is to be noted, in various instances, aQuantum Processing Unit may also be employed for the running ofprocesses consistent with the methods disclosed herein.

For instance, in various instances, Dynamic RAMs may be used for systemmemory because they are more economical than Static RAMs (SRAM). Therule of thumb used to be that DRAMs had 4× the capacity for the samecost as SRAMs. However, due to declining demand for SRAMs in favor ofDRAMs, which difference has increased considerably due to the economiesof scale that favor DRAMs which are in high demand. Independent of cost,DRAMs are 4× as dense as SRAMs laid out in the same silicon area becausethey only require one transistor and capacitor per bit compared to 4transistors per bit to implement the SRAM's flip-flop. The DRAMrepresents a single bit of information as the presence or absence ofcharge on a capacitor.

A problem with this arrangement is that the charge decays over time, soit has to be refreshed periodically. The need to do this has led toarchitectures that organize the memory into independent blocks andaccess mechanisms that deliver multiple words of memory per request.This compensates for times when a given block is unavailable while beingrefreshed. The idea is to move a lot of data while a given block isavailable. This is in contrast to SRAMs in which any location in memoryis available in a single access in a constant amount of time. Thischaracteristic allows memory accesses to be single word oriented ratherthan block oriented. DRAMs work well in a caching architecture becauseeach cache miss leads to a block of memory being read in from the DRAM.The theory of locality of reference is that if just accessed word N,then probably going to access words N+1, N+2, N+3 and so on, soon.

FIG. 51 provides an exemplary implementation of a system 500 of thedisclosure, including one or more of the expansions cards of FIG. 50,such as for bioinformatics processing 10. The system includes a Bio ITprocessing chip 100 that is configured for performing one or morefunctions in a processing pipeline, such as base calling, errorcorrection, mapping, alignment, sorting, assembly, variant calling, andthe like as described herein.

The system 500 further includes a configuration manager that is adaptedfor configuring the onboard functioning of the one or more processors100. Specifically, in various embodiments, the configuration manager isadapted to communicate instructions to the internal controller of theFPGA, e.g., firmware, such as by a suitably configured driver over aloose or tightly coupled interconnect, so as to configure the one ormore processing functions of the system 500. For instance, theconfiguration manager may be adapted to configure the internalprocessing clusters 11 and/or engines 13 associated therewith so as toperform one or more desired operations, such as mapping, aligning,sorting, variant calling, and the like, in accordance with theinstructions received. In such a manner only the clusters 11 containingthe processing engines 13 for performing the requested processingoperations on the data provided from the host system 1000 to the chip100 may be engaged to process the data in accordance with the receivedinstructions.

Additionally, in various embodiments, the configuration manager mayfurther be adapted so as to itself be adapted, e.g., remotely, by athird party user, such as over an API connection, as described ingreater detail herein above, such as by a user interface (GUI) presentedby an App of the system 500. Additionally, the configuration manager maybe connected to one or more external memories, such as a memory formingor otherwise containing a database, such as a data base including one ormore reference or individually sequenced genomes and/or an indexthereof, and/or one or more previously mapped, aligned, and/or sortedgenomes or portions thereof. In various instances, the database mayfurther include one or more genetic profiles characterizing a diseasedstate such as for the performance of one or more tertiary processingprotocols, such as upon newly mapped, aligned genetic sequences or a VCFpertaining thereto.

The system 500 may also include a web-based access so as to allow remotecommunications such as via the internet so as to form a cloud or atleast a hybrid cloud 504 communications platform. In such a manner asthis, the processed information generated from the Bio IT processor,e.g., results data, may be encrypted and stored as an electronic healthrecord, such as in an external, e.g., remote, database. In variousinstances, the EMR database may be searchable, such as with respect tothe genetic information stored therein, so as to perform one or morestatistical analyses on the data, such as to determine diseased statesor trends or for the purposes of analyzing the effectiveness of one ormore prophylactics or treatments pertaining thereto. Such informationalong with the EMR data may then be further processed and/or stored in afurther database 508 in a manner so as to insure the confidentiality ofthe source of the genetic information.

More particularly, FIG. 51 illustrates a system 500 for executing asequence analysis pipeline on genetic sequence data. The system 500includes a configuration manager 502 that includes a computing system.The computing system of the configuration manager 502 can include apersonal computer or other computer workstation, or can be implementedby a suite of networked computers. The configuration manager 502 canfurther include one or more third party applications connected with thecomputing system by one or more APIs, which, with one or moreproprietary applications, generate a configuration for processinggenomics data from a sequencer or other genomics data source. Theconfiguration manager 502 further includes drivers that load theconfiguration to the genomics pipeline processor system 10. The genomicspipeline processor system 10 can output result data to, or be accessedvia, the Web 504 or other network, for storage of the result data in anelectronic health record 506 or other knowledge database 508.

As discussed in several places herein above, the chip implementing thegenomics pipeline processor can be connected or integrated in asequencer. The chip can also be connected or integrated, e.g., directlyvia an interloper, or indirectly, e.g., on an expansion card such as viaa PCIe, and the expansion card can by connected or integrated in asequencer. In other implementations, the chip can be connected orintegrated in a server computer that is connected to a sequencer, totransfer genomic reads from the sequencer to the server. In yet otherimplementations, the chip can be connected or integrated in a server ina cloud computing cluster of computers and servers. A system can includeone or more sequencers connected (e.g. via Ethernet) to a servercontaining the chip, where genomic reads are generated by the multiplesequencers, transmitted to the server, and then mapped and aligned inthe chip.

For instance, in general next generation DNA sequencer (NGS) datapipelines, the primary analysis stage processing is generally specificto a given sequencing technology. This primary analysis stage functionsto translate physical signals detected inside the sequencer into “reads”of nucleotide sequences with associated quality (confidence) scores,e.g. FASTQ format files, or other formats containing sequence andusually quality information. Primary analysis, as mentioned above, isoften quite specific in nature to the sequencing technology employed. Invarious sequencers, nucleotides are detected by sensing changes influorescence and/or electrical charges, electrical currents, or radiatedlight. Some primary analysis pipelines often include: Signal processingto amplify, filter, separate, and measure sensor output; Data reduction,such as by quantization, decimation, averaging, transformation, etc.;Image processing or numerical processing to identify and enhancemeaningful signals, and associate them with specific reads andnucleotides (e.g. image offset calculation, cluster identification);Algorithmic processing and heuristics to compensate for sequencingtechnology artifacts (e.g. phasing estimates, cross-talk matrices);Bayesian probability calculations; Hidden Markov models; Base calling(selecting the most likely nucleotide at each position in the sequence);Base call quality (confidence) estimation, and the like. As discussedherein above, one or more of these steps may be benefitted byimplementing one or more of the necessary processing functions inhardware, such as implemented by an integrated circuit, e.g., an FPGA.Further, after such a format is achieved, secondary analysis proceeds,as described herein, to determine the content of the sequenced sampleDNA (or RNA etc.), such as by mapping and aligning reads to a referencegenome, sorting, duplicate marking, base quality score recalibration,local re-alignment, and variant calling. Tertiary analysis may thenfollow, to extract medical or research implications from the determinedDNA content.

Accordingly, given the sequential nature of the above processingfunctions, it may be advantageous to integrate primary, secondary,and/or tertiary processing acceleration in a single integrated circuit,or multiple integrated circuits positioned on a single expansion card.This may be beneficial because sequencers produce data that typicallyrequires both primary and secondary analysis so as to be useful and mayfurther be used in various tertiary processing protocols, andintegrating them in a single device is most efficient in terms of cost,space, power, and resource sharing. Hence, in one particular aspect, thedisclosure is directed to a system, such as to a system for executing asequence analysis pipeline on genetic sequence data. In variousinstances, the system may include an electronic data source, such as adata source that provides digital signals, for instance, digital signalsrepresenting a plurality of reads of genomic data, where each of theplurality of reads of genomic data include a sequence of nucleotides.The system may include one or more of a memory, such as a memory storingone or more genetic reference sequences and/or an index of the one ormore genetic reference sequences; and/or the system may include a chip,such as an ASIC, FPGA, or sASIC.

One or more aspects or features of the subject matter described hereincan be realized in digital electronic circuitry, integrated circuitry,specially designed application specific integrated circuits (ASICs),field programmable gate arrays (FPGAs), or structured ASIC computerhardware, firmware, software, and/or combinations thereof.

These various aspects or features can include implementation in one ormore computer programs that are executable and/or interpretable on aprogrammable system including at least one programmable processor, whichcan be special or general purpose, coupled to receive data andinstructions from, and to transmit data and instructions to, a storagesystem, at least one input device, and at least one output device. Theprogrammable system or computing system may include clients and servers.A client and server are generally remote from each other and typicallyinteract through a communication network. The relationship of client andserver arises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other.

These computer programs, which can also be referred to as programs,software, software applications, applications, components, or code,include machine instructions for a programmable processor, and can beimplemented in a high-level procedural and/or object-orientedprogramming language, and/or in assembly/machine language. As usedherein, the term “machine-readable medium” refers to any computerprogram product, apparatus and/or device, such as for example magneticdiscs, optical disks, memory, and Programmable Logic Devices (PLDs),used to provide machine instructions and/or data to a programmableprocessor, including a machine-readable medium that receives machineinstructions as a machine-readable signal. The term “machine-readablesignal” refers to any signal used to provide machine instructions and/ordata to a programmable processor. The machine-readable medium can storesuch machine instructions non-transitorily, such as for example as woulda non-transient solid-state memory or a magnetic hard drive or anyequivalent storage medium. The machine-readable medium can alternativelyor additionally store such machine instructions in a transient manner,such as for example as would a processor cache or other random accessmemory associated with one or more physical processor cores.

Additionally, due to the immense growth in data production andacquisition in the 21^(st) Century, a need has developed for increasedprocessing power that is capable of handling the ever-growingcomputationally intense analyses upon which modern development isfounded. Supercomputers have been introduced, and have been useful foradvancing technological development over a wide range of platforms.However, although supercomputing is useful, it has proven to beinsufficient for some of the very complex computing problems many oftoday's technology companies face. Particularly, since the sequencing ofthe human genome, the technological advancement in the biological artshas been exponential. Nevertheless, in view of the high rate andincreased complexity of the raw data produced every day, there hasevolved a problematic bottleneck in the processing and analysis of thedata generated. Quantum computers have been developed therefor to helpresolve this bottleneck. Quantum computing represents a new frontline incomputing, providing an entirely new approach to solving the worlds mostchallenging computational needs.

Quantum computing has been known since 1982. For instance, in theInternational Journal of Theoretical Physics, Richard Feynman theorizeda system for performing quantum computing. Specifically, Feynmanproposed a quantum system that could be configured for use in simulatingother quantum systems in such a manner that the conventional functionsof computer processing can be performed more quickly and efficiently.See Feynman, 1982, International Journal of Theoretical Physics 21, pp.467-488, which is hereby incorporated by reference in its entirety.Particularly, a quantum computer system can be designed so as to exhibitexponential time savings in complex computations. Such controllablequantum systems are commonly known as quantum computers, and have beensuccessfully developed into general purpose processing computers thatnot only can be used to simulate quantum systems, but can also beadapted for running specialized quantum algorithms. More particularly,complex problems can be modeled in the form of an equation, such as aHamiltonian, which may be represented in the quantum system in a mannerthat the behavior of the system provides information regarding thesolution to the equation. See Deutsch, 1985, Proceedings of the RoyalSociety of London A 400, pp. 97-117, which is hereby incorporated byreference in its entirety. In such instances, solving a model for thebehavior of the quantum system may be configured so as to involvesolving a differential equation related to the wave-mechanicaldescription of a particle, e.g., Hamiltonian, of the quantum system.

In essence, quantum computing is a computational system that usesquantum-mechanical phenomena, e.g., superposition and/or entanglement,to perform various calculations on large amounts of data extremely fast.As such, quantum computers are a vast improvement over conventionaldigital logic computers. Specifically, conventional digital logiccircuits function by using binary digital logic gates that are formedthrough the hardwiring of electronic circuitry on a conductivesubstrate. In a digital logic circuit an “on/off” state of a transistorserves as a basic unit of information, e.g., a bit. Particularly, acommon digital computer processor employs binary digits, e.g., bits, inan “on” or “off” state, e.g., as a 0 or 1, to encode data. Quantumcomputation, on the other hand, employs an information device that usessuperpositions of entangled states, called quantum bits or qubits, toencode data.

The basis for performing such quantum computations is an informationdevice, e.g., a unit, which forms the quantum bit. The qubit isanalogous to the digital “bit” in traditional digital computers, exceptthat the qubit has far more computational potential than a digital bit.Particularly, as described in greater detail herein, instead of onlyencoding one of two discrete states, like a “0” and a “1,” as found in adigital bit, a qubit can also be placed in a superposition of “0” and“1.” Specifically, the qubit can exist in both the “0” and “1” state atthe same time. Consequently, the qubit can perform a quantum computationon both states simultaneously. In general, N qubits can be in asuperposition of 2^(N) states. Quantum algorithms, therefore, can makeuse of this superposition property to speed up certain computations.

A qubit, therefore, is analogous to a bit in a traditional digitalcomputer, and is a type of information device that exhibits coherence.Particularly, a quantum computing device is built up from a plurality ofinformation device, e.g., qubit, building blocks. For instance, thecomputing power of a quantum computer increases as the informationdevices that form its building blocks are coupled, e.g., entangled,together in a controllable manner. In such an instance, the quantumstate of one information device affects the quantum state of each of theother information devices to which it is coupled.

Accordingly, like the bit in classic digital computing, the qubit inquantum computing serves as the basic unit for the encoding ofinformation, such as quantum information. Similar to a bit, the qubitencodes data in a two-state system, which in this instance is aquantum-mechanical system. Specifically, for the qubit, the two quantumstates involve entanglement, such as involving the polarization of asingle photon. Hence, where in a classical system, a bit has to be inone state or the other, in a quantum computing platform, the qubit maybe in a superposition of both states at the same time, which property isfundamental to quantum processing. Consequently, the distinguishingfeature between the qubit and the classical bit is that multiple qubitsexhibit quantum entanglement. Such entanglement is a nonlocal propertythat allows a set of qubits to express higher correlation than ispossible in a classical system.

In order to function, such information devices, e.g., quantum bits, mustfulfill several requirements. First, the information device must bereducible to a quantum two-level system. This means that the informationdevice must have two distinguishable quantum states that may be used forperforming computations. Second, the information devices must be capableof producing quantum effects like entanglement and superposition.Additionally, in certain instances, the information device may beconfigured for storing information, e.g., quantum information, such asin a coherent form. In such instances, the coherent device may have aquantum state that persists without significant degradation for a longperiod of time, such as on the order of microseconds or more.

Particularly, quantum entanglement is the physical phenomenon thatoccurs when a pair or a group of particles are generated or otherwiseconfigured to interact in a manner that the quantum state of oneparticle cannot be described independently of another, despite the spacethat separates them. Consequently, instead of describing the state ofone particle in isolation of the others, a quantum state must bedescribed for the system as a whole. In such instances, the measurementsof various physical properties, such as position, momentum, spin, and/orpolarization, performed on entangled particles are correlated. Forexample, if a pair of particles are generated in such a way that theirtotal spin is known to be zero, and one particle is found to haveclockwise spin on a certain axis, the spin of the other particle,measured on the same axis, will be found to be counterclockwise, as tobe expected due to their entanglement.

Hence, one particle of an entangled pair simply “knows” what measurementhas been performed on the other, and with what outcome, even thoughthere is no known means for such information to have been communicatedbetween the particles, which at the time of measurement may be separatedby arbitrarily large distances. Because of this relationship, unlikeclassical bits that can only have one value at a time, entanglementallows multiple states to be acted on simultaneously. It is these uniqueentangled relationships and quantum states that have been capitalizedupon for the development of quantum computing.

Accordingly, there are various kinds of physical operations employingpure qubit states that can be performed. For instance, a quantum logicgate can be formed and configured to operate on the basic qubit, wherethe qubit undergoes a unitary transformation, such as where the unitarytransformations corresponds to rotations, or other quantum phenomena, ofthe qubit. In fact, any two-level system can be used as a qubit, such asphotons, electrons, nuclear spins, coherent light states, opticallattices, Josephson junctions, quantum dots, and the like. Specifically,a quantum gate is the basis for a quantum circuit operating on a smallnumber of qubits. For instance, a quantum circuit is comprised ofquantum gates that act on fixed numbers of qubits, such as two or three,or more. Qubits, therefore, are the building blocks of quantum circuits,like classical logic gates are for conventional digital circuits.Specifically, a quantum circuit is a model for quantum computation wherethe computation is a sequence of quantum gates that are reversibletransformations on a quantum mechanical analog of an n-bit register.Such analogous structures are referred to as n-qubit registers. Hence,unlike classical logic gates Quantum logic gates are always reversible.

Particularly, as described herein, a digital logic gate is a physical,wired device that may be implemented using one or more diodes ortransistors that act as electronic switches for performing logicaloperations, e.g., Boolean functions, on one or more binary inputs, so asto produce a single binary output. With amplification, logic gates canbe cascaded in the same way that Boolean functions can be composed,allowing the construction of a physical model of all of Boolean logic,and therefore, all of the algorithms and mathematics that can bedescribed with Boolean logic can be performed by digital logic gates. Ina like manner a cascade of quantum logic gates can be formed for theperformance of Boolean logic operations.

Quantum gates are usually represented as matrices. In variousimplementations, a quantum gate acts on k qubits that may be representedby a 2 k×2 k unitary matrix. In such instances, the number of qubits inthe input and output of the gate should be equal, and the action of thegate on a specific quantum state is found by multiplying the vector thatrepresents the state by the matrix representing the gate. Hence, giventhis configuration quantum computational operations may be executed on avery small number of quantum bits. For instance, there are quantumalgorithms that are configured for running much more complexcomputations faster than any possible probabilistic classical algorithm.Particularly, a quantum algorithm is an algorithm that runs on a quantumcircuit model of computation.

Where a classical algorithm is a finite sequence of step-by-stepinstructions or procedures that may be performed by digital logiccircuits of a classic computer; a quantum algorithm is a step-by-stepprocedure, where each of the steps can be performed on a quantumcomputer. However, even though quantum algorithms exist, such as Shor's,Grovar's, and Simon's algorithms, all classical algorithms can also beperformed on a quantum computer with the correct configurations. Quantumalgorithms are usually used for those algorithms that are inherentlyquantum, e.g., such as involving superposition or quantum entanglement.Quantum algorithms may be stated in various models of quantumcomputation, such as the Hamiltonian oracle model.

Accordingly, as a classical computer has a memory made up of bits, whereeach bit is represented by either a “1” or a“0”; a quantum computersupports a sequence of qubits where a single qubit can represent a one,a zero, or any quantum superposition of those two qubit states.Consequently, a pair of qubits can be in any quantum superposition of 4states, and three qubits can be in any superposition of 8 states. Ingeneral, a quantum computer with n qubits can be in an arbitrarysuperposition of up to 2^(n) different states simultaneously, whichcompares to a normal computer that can only be in one of these 2^(n)states at any one time. Therefore, qubits can hold exponentially moreinformation than their classical counterparts. In action, a quantumcomputer operates by setting the qubits in a drift that solves theproblem by manipulating those qubits with a fixed sequence of quantumlogic gates. It is this sequence of quantum logic gates that forms theoperations of quantum algorithms. The calculation ends with ameasurement, collapsing the system of qubits into one of the 2^(n) purestates, where each qubit is “0” or “1”, thereby decomposing into aclassical state. Hence, traditional algorithms may also be performed ona quantum computing platform, where the outcome is typically n classicalbits of information.

In standard notation, the basic states of a qubit are referred to as the“0” and “1” states. However, during quantum computation, the state of aqubit, in general, may be a superposition of the basic or basis statessuch that the qubit has a nonzero probability of occupying the “0” basisstate and a simultaneous nonzero probability of occupying the “1” basisstate. Accordingly, the quantum nature of the qubit is largely derivedfrom its ability to exist in a coherent superposition of basis states,and for the state of the qubit to have a phase. A qubit will retain thisability to exist as a coherent superposition of basis states as long asthe qubit is sufficiently isolated from sources of decoherence.

Consequently, to complete a computation using a qubit, the state of thequbit is measured. As indicated above, when a measurement of the qubitis done, the quantum nature of the qubit may be temporarily lost and thesuperposition of the basis states may collapse to either the “0” basisstate or the “1” basis state. Thus, in such a manner as this, the qubitregains its similarity to a conventional digital “bit”. However, theactual state of the qubit after it has collapsed will depend on thevarious probability states present immediately prior to the measurementoperation. Thus, qubits may be employed to form quantum circuits, whichthemselves may be configured to form a quantum computer.

There are several general approaches to the design and operation of aquantum computer. One approach that has been put forth is that of acircuit model for quantum computing. Circuit model quantum computingrequires long quantum coherence, so the type of information device usedin quantum computers that support such an approach may be the qubit,which by definition has long coherence times. Accordingly, the circuitmodel for quantum computing is based upon the premise that qubits can beformed of and be acted on by logical gates, much like bits, and can beprogrammed using quantum logic in order to perform calculations, such asBoolean computations. Research has been done to develop qubits that canbe programmed to perform quantum logic functions in this manner. Forexample, see Shor, 2001, arXiv.org:quant-ph/0005003, which is herebyincorporated by reference in its entirety. Likewise, a computerprocessor may take the form of a quantum processor such as asuperconducting quantum processor.

A superconducting quantum processor may include a number of qubits andassociated local bias devices, for instance, two, three, or moresuperconducting qubits. Accordingly, although in various embodiments, acomputer processor may be configured as a non-traditionalsuperconducting processor, in other embodiments, it the computerprocessor may be configured as a superconducting processor. Forinstance, in some embodiments, a non-traditional superconductingprocessor may be configured so as to not focus on quantum effects suchas superposition, entanglement, and/or quantum tunneling, but may ratheroperate by emphasizing different principles, such as those principlesthat govern the operation of classical computer processors. In otherembodiments, the computer processor may be configured as a traditionalsuperconducting processor such as by being adapted to process throughvarious quantum effects, such as superposition, entanglement, and/orquantum tunneling.

Accordingly, in various instances, there may be certain advantages tothe implementation of such superconducting processors. Particularly, dueto their natural physical properties, superconducting processors ingeneral may be capable of higher switching speeds and shortercomputation times than non-superconducting processors, and therefore itmay be more practical to solve certain problems on superconductingprocessors. Further, detail and embodiments of exemplary quantumprocessors that may be used in conjunction with the present devices,systems, and the methods of their use are described in U.S. Ser. Nos.11/317,838; 12/013,192; 12/575,345; 12/266,378; 13/678,266; and Ser. No.14/255,561; as well as the various divisionals, continuations, and/orcontinuation in parts thereof; including U.S. Pat. Nos. 7,533,068;7,969,805; 9,026,574; 9,355,365; 9,405,876; and all of their foreigncounterparts, which are hereby incorporated by reference in theirentireties.

Further, in addition to the above quantum devices and systems, methodsfor their use in solving complex computational problems are alsopresented. For instance, the quantum devices and systems hereindisclosed may be employed for controlling the quantum state of one ormore information devices and/or systems, in a coherent manner, so as toperform one or more steps in a bioinformatics and/or genomics processingpipeline, such as for the performance of one or more operations in animage processing, base calling, mapping, aligning, sorting, variantcalling, and/or other genomics and/or bioinformatics pipeline. Inparticular embodiments, the one or more operations may includeperforming a burrow-wheelers, smith-waterman, and/or an HMM operation.

Particularly, solving complex genomics and/or bioinformaticscomputational problems using a quantum computing device may includegenerating one or more qubits and using the same to form a quantum logiccircuit representation of the computational problem, encoding the logiccircuit representation as a discrete optimization problem, and solvingthe discrete optimization problem using the quantum processor. Therepresentation may be an arithmetic and/or geometric problem forsolution by an addition, subtraction, multiplication, and/or dividecircuit. The discrete optimization problem may be composed of a set ofminiature optimization problems, where each miniature optimizationproblem encodes a respective logic gate from the logic circuitrepresentation. For instance, a mathematical circuit may employ binaryrepresentations of factors, and these binary representations may bedecomposed to reduce the total number of variables required to representthe mathematical circuit. Accordingly, in accordance with the teachingsherein, a computer processor may take the form of a digital and/or ananalog processor, for instance, a quantum processor such as asuperconducting quantum processor. A superconducting quantum processormay include a number of qubits and associated local bias devices, forinstance two or more superconducting qubits, which may be formed intoone or more quantum logic circuit representations.

More particularly, in various embodiments, a superconducting integratedcircuit may be provided. Specifically, in particular embodiments, such asuperconducting integrated circuit may include a first superconductingcurrent path that is disposed in a metal, e.g., first, metal layer. Adielectric, e.g., first dielectric, layer may also be included, such aswhere at least a portion of the dielectric layer is associated withinand/or carried on the first metal layer. A second superconductingcurrent path may also be included and disposed in a second metal layer,such as metal layer that is carried on or otherwise associated with thefirst dielectric layer. In such an embodiment, at least a portion of thesecond superconducting current path may overlay at least a portion ofthe first superconducting current path. Likewise, a second dielectriclayer may also be included, such as where at least a portion of thesecond dielectric layer is associated with or carried on the secondmetal layer. Additionally, a third superconducting current path may beincluded and disposed in a third metal layer that may be associated withor carried on the second dielectric layer, such as where at least aportion of the third superconducting current path may overlay at least aportion of one or both of the first and second superconducting currentpaths. One or more additional metal layers, dielectric layers, and/orcurrent paths may also be included and configured accordingly.

Further, a first superconducting connection may be positioned betweenthe first superconducting current path and the third superconductingcurrent path, such as where the first superconducting connection extendsthrough both the first dielectric layer and the second dielectric layer.A second superconducting connection may also be included and positionedbetween the first superconducting current path and the thirdsuperconducting current path, such as where the second superconductingconnection may extend through both the first dielectric layer and thesecond dielectric layer. Additionally, at least a portion of the secondsuperconducting current path may be encircled by an outersuperconducting current path that may be formed by at least a portion ofone or more of the first superconducting current path, at least aportion of the second superconducting current path, and/or the first andsecond superconducting connections. Accordingly, in such instances, thesecond superconducting current path may be configured to couple, e.g.,inductively couple, a signal to the outer superconducting current path.

In some embodiments, a mutual inductance between the secondsuperconducting current path and the outer superconducting current pathmay be sub-linearly proportional to a thickness of the first dielectriclayer and a thickness of the second dielectric layer. The first and thesecond superconducting connections may also each include at least onerespective superconducting via. Further, in various embodiments, thesecond superconducting current path may be a portion of an input signalline and one or both the first and the third superconducting currentpaths may be coupled to a superconducting programmable device. In otherembodiments, the second superconducting current path may be a portion ofa superconducting programmable device and both the first and the thirdsuperconducting current paths may be coupled to an input signal line. Inparticular embodiments, the superconducting programmable device may be asuperconducting qubit, which may then be coupled, e.g., quantumlycoupled, to one or more other qubits so as to from a quantum circuit,such as of a quantum processing device.

Accordingly, provided herein are devices, systems, and methods forsolving computational problems, especially problems related to resolvingthe genomics and/or bioinformatics bottleneck described herein above. Invarious embodiments, these devices, systems and methods introduce atechnique whereby a logic circuit representation of a computationalproblem may be solved directly and/or may be encoded as a discreteoptimization problem, and the discrete optimization problem may then besolved using a computer processor, such as a quantum processor. Forinstance, in particular embodiments, solving such discrete optimizationproblems may include executing the logic circuit to solve the originalcomputational problem.

Hence, the devices, systems, and methods described herein may beimplemented using any form of computer processor such as includingtraditional logic circuits and/or logic circuit representations, such asconfigured for use as a quantum processor and/or in super conductingprocessing. Particularly, various steps in performing an imageprocessing, base calling, mapping, aligning, and/or variant callingbioinformatics pipeline may be encoded as discrete optimization problemsand as such may be particularly well-suited to be solved using thequantum processors, disclosed herein. In other instances, suchcomputations may be resolved more generally by a computer processor thatharnesses quantum effects to achieve such computation; and/or in otherinstances, such computations may be performed using a dedicatedintegrated circuit, such as an FPGA, ASIC, or structured ASIC, asdescribed herein in detail. In some embodiments, the discreteoptimization problem is cast as a problem by configuring the logiccircuits, qubits, and/or couplers in a quantum processor. In someembodiments, the quantum processor may be specifically adapted tofacilitate solving such discrete optimization problems.

As disclosed throughout this specification and the appended claims,reference is often made to a “logic circuit representation”, e.g., of acomputational problem. Depending on the context, a logic circuit mayincorporate a set of logical inputs, a set of logical outputs, and a setof logic gates (e.g., NAND gates, XOR gates, and the like) thattransform the logical inputs to the logical outputs through a set ofintermediate logical inputs and intermediate logical outputs. A completelogic circuit may include a representation of the input(s) to thecomputational problem, a representation of the output(s) of thecomputational problem, and a representation of the sequence ofintermediate steps in between the input(s) and the output(s).

Thus, for various purposes of the present devices, systems, and methods,the computational problem may be defined by its input(s), its output(s),and the intermediate steps that transform the input(s) to the output(s)and a “logic circuit representation” may include all of these elements.Those of skill in the art will appreciate that the encoding of a “logiccircuit representation” of a computational problem as a discreteoptimization problem, and the subsequent mapping of the discreteoptimization problem to a quantum processor, may result in any number oflayers involving any number of qubits per layer. Furthermore, such amapping may implement any scheme of inter-qubit coupling to enable anyscheme of inter-layer coupling (e.g., coupling between the qubits ofdifferent layers) and intra-layer coupling (e.g., coupling between thequbits within a particular layer).

Accordingly, as indicated, in some embodiments, the structure of a logiccircuit may be stratified into layers. For example, the logical input(s)may represent a first layer, each sequential logical (or arithmetic)operation may represent a respective additional layer, and the logicaloutput(s) may represent another layer. And as previously described, alogical operation may be executed by a single logic gate or by acombination of logic gates, depending on the specific logical operationbeing executed. Thus, a “layer” in a logic circuit may include a singlelogic gate or a combination of logic gates depending on the particularlogic circuit being implemented.

Consequently, in various embodiments such as where the structure of alogic circuit stratifies into layers (for example, with the logicalinput(s) representing a first layer, each sequential logical operationrepresenting a respective additional layer, and the logical output(s)representing another layer), each layer may be embodied by a respectiveset of qubits in the quantum and/or superconducting processor. Forexample, in one embodiment of a quantum processor, one or more, e.g.,each, row of qubits may be programmed to represent a respective layer ofa quantum logic circuit. That is, particular qubits may be programmed torepresent the inputs to a logic circuit, other qubits may be programmedto represent a first logical operation (executed by either one or aplurality of logic gates), and further qubits may be programmed torepresent a second logical operation (similarly executed by either oneor a plurality of logic gates), and yet further qubits may be programmedto represent the outputs of the logic circuit.

Additionally, with various sets of qubits representing various layers ofthe problem, it can be advantageous to enable independent dynamiccontrol of each respective set. Further, in various embodiments, variousserial logic circuits may be mapped to the quantum processor, and therespective qubits mapped to facilitate the functional interactions forquantum processing in a manner suitable to enable independent controlthereof. From the above, those of skill in the art will appreciate how asimilar objective function may be defined for any logic gate. Thus, insome embodiments, the problem representing a logic circuit mayessentially be comprised of a plurality of miniature optimizationproblems, where each gate in the logic circuit corresponds to aparticular miniature optimization problem.

Hence, exemplary logic circuit representations may be generated usingsystems and methods that are known in the art. In one example, a logiccircuit representation of the computational problem, e.g., the genomicsand/or bioinformatics problem, may be generated and/or encoded using aclassical digital computer processor and/or a quantum and/orsuperconducting processor as described herein. Accordingly, a logiccircuit representation of the computational problem may be stored in atleast one computer- or processor-readable storage medium, such as acomputer-readable non-transitory storage medium or memory (e.g.,volatile or non-volatile). Therefore, as discussed herein, the logiccircuit representation of the computational problem may be encoded as adiscrete optimization problem, or a set of optimization objectives, andin various embodiments, such as where a classical digital computerprocessing paradigm is configured to solve the problem, the system maybe configured so that bit strings that satisfy the logic circuit haveenergy of zero and all other bit strings have energy greater than zero,where the discrete optimization problem may be solved in such a manneras to establish a solution to the original computational problem.

Further, in other embodiments, the discrete optimization problem may besolved using a computer processor, such as a quantum processor. In suchan instance, solving the discrete optimization problem may then involve,for example, evolving the quantum processor to the configuration thatminimizes the energy of the system in order to establish a bit stringthat satisfies the optimization objective(s). Accordingly, in someembodiments, the act of solving a discrete optimization problem mayinclude three acts. First, the discrete optimization problem may bemapped to a computer processor. In some embodiments, the computerprocessor may include a quantum and/or super conducting processor andmapping the discrete optimization problem to the computer processor mayinclude programming the elements (e.g., qubits and couplers) of thequantum and/or superconducting processor. Mapping the discreteoptimization problem to the computer processor may include the discreteoptimization problem in at least one computer or processor-readablestorage medium, such as a computer-readable non-transitory storagemedium or memory (e.g., volatile or non-volatile).

Accordingly, in view of the above, in various instances, a device,system, and method for executing a sequence analysis pipeline, such ason genomics material, is provided. For instance, the genomics materialmay include a plurality of reads of genomic data, such as in an imagefile, BCL, FASTQ file, and the like. In various embodiments, the deviceand/or system may be employed for executing a sequence analysis ongenomic data, e.g., reads of genomic data, such as by using an index ofone or more genetic reference sequences, e.g., stored in a memory, forexample, where each read of genomic data and each reference sequencerepresents a sequence of nucleotides.

Particularly, in various embodiments, the device may be a quantumcomputing device, such as formed of a set of quantum logic circuits,e.g., hardwired quantum logic circuits, for instance, where the logiccircuits are interconnected with one another. In various instances, thequantum logic circuits may be interconnected by one or moresuperconducting connections. Additionally, one or more of thesuperconducting connections may include a memory interface, such as foraccessing the memory. Together the logic circuits and interconnects maybe configured to process information represented as a quantum state thatis itself represented as a set of one or more qubits. More particularly,the set of hardwired quantum logic circuits may be arranged as a set ofprocessing engines, such as where each processing engine may be formedof a subset of the hardwired quantum logic circuits, and may beconfigured to perform one or more steps in the sequence analysispipeline on the reads of genomic data.

For instance, the set of processing engines may be configured so as toinclude an image processing, base calling, mapping, aligning, sorting,variant calling, and/or other genomics and/or bioinformatics processingmodule. For example, in various embodiments, a mapping module, such asin a first hardwired configuration, may be included. Additionally, infurther embodiments, an alignment module, such as in a second hardwiredconfiguration, may be included. Further, a sorting module, such as in athird hardwired configuration, may be included. And, in additionalembodiments, a variant calling module, such as in a fourth hardwiredconfiguration, may be included. Further still, in various embodiments,an image processing and/or base calling module may be included infurther hardwired configurations, such as where one or more of thesehardwired configurations may include hardwired quantum logic circuitsmay be arranged as a set of processing engines.

More particularly, in particular instances, a quantum computing deviceand/or system may include a mapping module, where the mapping modulecomprises a set of quantum logic circuits that are arranged as a set ofprocessing engines, one or more of which are configured for performingone or more steps of a mapping procedure. For instance, one or morequantum processing engines may be configured to receive a read ofgenomic data, such as via one or more of a plurality of superconductingconnections. Further, the one or more quantum processing engines may beconfigured to extract a portion of the read to generate a seed, such aswhere the seed may represent a subset of the sequence of nucleotidesrepresented by the read. Additionally, one or more of the quantumprocessing engines may be configured to calculate a first address withinthe index based on the seed, and access the address in the index in thememory, so as to receive a record from the address, such as where therecord represents position information in the genetic referencesequence. Further more, the one or more quantum processing engines maybe configured to determine, e.g., based on the record, one or morematching positions from the read to the genetic reference sequence; andoutput at least one of the matching positions to the memory via thememory interface.

Further still, the mapping module may include a set of quantum logiccircuits that are arranged as a set of processing engines configured forcalculating a second address within the index, e.g., based on both ofthe record and of a second subset of the sequence of nucleotides that isnot contained in the first subset of the sequence of nucleotides. Theprocessing engine(s) may then access the second address in the index inthe memory so as to receive a second record from the second address,such as where the second record, or a subsequent record, includesposition information in the genetic reference sequence. The processingengine may further be configured for determining, based on the positioninformation, the one or more matching positions from the read to thegenetic reference sequence.

Additionally, in various instances, a quantum computing device and/orsystem may include an alignment module, where the alignment modulecomprises a set of quantum logic circuits that are arranged as a set ofprocessing engines, one or more of which are configured for performingone or more steps of an alignment procedure. For instance, one or morequantum processing engines may be configured to receive a plurality ofmapped positions for the read from the memory, and to access the memoryto retrieve a segment of the genetic reference sequence corresponding toeach of the mapped positions. The one or more processing engines formedas an alignment module may further be configured to calculate analignment of the read to each retrieved segment of the genetic referencesequence so as to generate a score for each alignment. Further, once oneor more scores have been generated at least one best-scoring alignmentof the read may be selected. In particular instances, the quantumcomputing device may include a set of quantum logic circuits that arearranged as a set of processing engines that are configured forperforming a gapped or gapless alignment, such as a Smith Watermanalignment.

Further, in certain instances, a quantum computing device and/or systemmay include a variant calling module, where the variant calling modulecomprises a set of quantum logic circuits that are arranged as a set ofprocessing engines, one or more of which are configured for performingone or more steps of a variant calling procedure. For instance, thequantum computing variant calling module may include a set of quantumlogic circuits that are adapted for executing an analysis on a pluralityof reads of genomic data, such as using one or more candidatehaplotypes, e.g., stored in a memory, where each read of genomic dataand each candidate haplotype represent a sequence of nucleotides.

Specifically, the set of quantum logic circuits may be formed as one ormore quantum processing engines that are configured to receive one ormore of the reads of genomic data and generate and/or receive the one ormore candidate haplotypes, e.g., from the memory, such as via one ormore of a plurality of superconducting connections. Further, the one ormore quantum processing engines may be configured to receive one or moreof the reads of genomic data and the one or more candidate haplotypesfrom the memory, as well as to compare nucleotides in each of the one ormore reads to the one or more candidate haplotypes, so as to determine aprobability of each candidate haplotype representing a correct variantcall. Additionally, one or more of the quantum processing engines may beconfigured to generate an output based on the determined probability.

Additionally, in various instances, the set of quantum logic circuitsmay be formed as one or more quantum processing engines that areconfigured to determine a probability of observing each read of theplurality of reads based on at least one candidate haplotype being atrue sequence of nucleotides, e.g., of a source organism of theplurality of reads. In particular instances, with respect to determiningprobability, the one or more quantum processing engines may beconfigured for executing a Hidden Markov Model. More particularly, inadditional embodiments, the one or more quantum processing engines maybe configured for merging the plurality of reads into one or morecontiguous nucleotide sequences, and/or for generating the one or morecandidate haplotypes from the one or more contiguous nucleotidesequences. For instance, in various embodiments, the merging of theplurality of reads includes the one or more quantum processing enginesconstructing a De Bruijn graph.

Accordingly, in light of the above, a system for performing variouscomputations in solving problems related to genomics and/orbioinformatics processing is provided. For instance, the system mayinclude one or more of an onsite automated sequencer, e.g., NGS, and/ora processing server either or both of which may include one or moreCPUs, GPUs, QPUs, and/or other integrated circuits, such as including anFPGA, ASIC, and/or structured ASIC that are configured as hereindescribed for performing one or more steps in a sequence analysispipeline. Particularly, the Next Gen Sequencer may be configured forsequencing a plurality of nucleic acid sequences so as to generate oneor more image, BCL, and/or FASTQ files representing the sequencednucleic acid sequences, which nucleic acid sequences may be a DNA and/oran RNA sequence. These sequence files may be processed by the sequenceritself or by an associated server unit, such as where the sequencerand/or the associated server includes an integrated circuit, such as anFPGA or ASIC, configured as herein described for performing one or moresteps in a secondary sequence analysis pipeline.

However, in various instances, such as where the automated sequencerand/or an associated server is not configured for performing a secondarysequence analysis on the data generated from the sequencer, thegenerated data may be transmitted to a remote server that is configuredfor performing a secondary and/or tertiary sequence analysis on thedata, such as via a cloud mediated interface. In such an instance, thecloud accessible server may be configured for receiving the generatedsequence data, such as in image, BCL, and/or in FASTQ form, and mayfurther be configured for performing a primary, e.g., image processing,and/or a secondary and/or tertiary processing analysis, such as asequence analysis pipeline, on the received data. For instance, thecould accessible server may be one or more servers including a CPUand/or a GPU and/or a QPU one or more of which may be associated with anintegrated circuit, such as an FPGA or ASIC, as herein described.Particularly, in certain instances, the cloud accessible server may be aquantum computing server, as herein described.

Specifically, the cloud accessible server may be configured forperforming a primary, secondary, and/or tertiary genomics and/orbioinformatics analysis on the received data, which analyses may includeperforming one or more steps in one or more of an image processing, basecalling, mapping, aligning, sorting, and/or variant calling protocols.In certain instances, some of the steps may be performed by oneprocessing platform, such as a CPU or GPU or QPU, and others may beperformed by another processing platform, such as an associated, e.g.,tightly coupled, integrated circuit, such as an FPGA or ASIC, that isspecifically configured for performing various of the steps in thesequence analysis pipeline. In such instances, where data and theresults of analysis are to be transferred from one platform to another,the system and its components may be configured for compressing the dataprior to transfer, and decompressing the data once transferred, and assuch the system components may be configured for generating one or moreof a SAM, BAM, or CRAM files, such as for transfer. Additionally, invarious embodiments, the cloud accessible server may be a quantumcomputing platform that is configured herein to perform one or moresteps in the sequence analysis pipeline, as described herein, and mayinclude the performance of one or more secondary and/or tertiaryprocessing steps in accordance with one or more of the methods disclosedherein.

Further, with respect to quantum computing, detail and embodiments ofexemplary quantum processors and the methods of their use that may beemployed in conjunction with the present devices, systems, and methodsare described in U.S. Pat. Nos. 7,135,701; 7,533,068; 7,969,805;8,560,282; 8,700,689; 8,738,105; 9,026,574; 9,355,365; 9,405,876; aswell as the various counterparts thereto, which are hereby incorporatedby reference in their entireties.

To provide for interaction with a user, one or more aspects or featuresof the subject matter described herein can be implemented on a computerhaving a display device, such as for example a cathode ray tube (CRT), aliquid crystal display (LCD) or a light emitting diode (LED) monitor fordisplaying information to the user and a keyboard and a pointing device,such as for example a mouse or a trackball, by which the user mayprovide input to the computer. Other kinds of devices can be used toprovide for interaction with a user as well. For example, feedbackprovided to the user can be any form of sensory feedback, such as forexample visual feedback, auditory feedback, or tactile feedback; andinput from the user may be received in any form, including, but notlimited to, acoustic, speech, or tactile input. Other possible inputdevices include, but are not limited to, touch screens or othertouch-sensitive devices such as single or multi-point resistive orcapacitive trackpads, voice recognition hardware and software, opticalscanners, optical pointers, digital image capture devices and associatedinterpretation software, and the like.

The subject matter described herein can be embodied in systems,apparatus, methods, and/or articles depending on the desiredconfiguration. The implementations set forth in the foregoingdescription do not represent all implementations consistent with thesubject matter described herein. Instead, they are merely some examplesconsistent with aspects related to the described subject matter.Although a few variations have been described in detail above, othermodifications or additions are possible. In particular, further featuresand/or variations can be provided in addition to those set forth herein.For example, the implementations described above can be directed tovarious combinations and subcombinations of the disclosed featuresand/or combinations and subcombinations of several further featuresdisclosed above. In addition, the logic flows depicted in theaccompanying figures and/or described herein do not necessarily requirethe particular order shown, or sequential order, to achieve desirableresults. Other implementations may be within the scope of the followingclaims.

What is claimed is:
 1. A system for executing a sequence analysispipeline on a plurality of reads of genomic data using an index ofgenetic reference data stored in a memory, each read of genomic datarepresenting a sequence of nucleotides, the genetic reference datarepresenting one or more genetic reference sequences, the systemcomprising: a quantum computing device formed of a set of hardwiredquantum logic circuits interconnected by a plurality of superconductingconnections to process information represented as a quantum state thatis configured as a set of one or more qubits, one or more of theplurality of superconducting connections comprising a memory interfacefor accessing the memory, the set of hardwired quantum logic circuitsbeing arranged as a set of processing engines, each processing enginebeing formed of a subset of the hardwired quantum logic circuits toperform one or more steps in the sequence analysis pipeline on theplurality of reads of genomic data, the set of processing enginescomprising a mapping module in a first hardwired configuration to:receive a read of genomic data via the memory interface of the one ormore of the plurality of superconducting connections; extract a portionof the read to generate a seed, the seed representing a subset of thesequence of nucleotides represented by the read; calculate a firstaddress within the index based on the seed; access the address in theindex in the memory; receive a record from the address, the recordrepresenting position information in the genetic reference sequence;determine, based on the record, one or more matching positions from theread to the genetic reference sequence; and output at least one of thematching positions to the memory via the memory interface.